All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes
@ 2016-06-03 22:42 Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set Jacob Keller
                   ` (17 more replies)
  0 siblings, 18 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

This series implements several fixes for suspend/resume and related
behaviors for both PF and VF driver. The major changes are to commonize
the suspend/resume, io_error_detected, reset_notify and reinit flows to
use one set of functions. This ensures that all flows set the same
state, preventing multiple flows from interacting together. In addition
it ensures that each flow does every step that it needs. This was
previously overlooked because each flow was coded separately.

In addition, there are several bug fixes scattered around the series for
issues that came up when testing the changes and the bugs that these
changes fix.

Jacob Keller (17):
  fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set
  fm10k: avoid possible null pointer dereference in fm10k_update_stats
  fm10k: prevent multiple threads updating statistics
  fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING
  fm10k: perform data path reset even when switch is not ready
  fm10k: use actual hardware registers when checking for pending Tx
  fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING
  fm10k: wait for queues to drain fully before stop_hw
  fm10k: split fm10k_reinit into two functions
  fm10k: implement prepare_suspend and handle_resume
  fm10k: use common reset flow when handling io errors from PCI stack
  fm10k: implement reset_notify handler for PCIe FLR events
  fm10k: use common flow for suspend and resume
  fm10k: enable bus master after every reset
  fm10k: check if PCIe link is restored
  fm10k: implement request_lport_map pointer
  fm10k: bump version number

Ngai-Mint Kwan (1):
  fm10k: Reset mailbox global interrupts

 drivers/net/ethernet/intel/fm10k/fm10k.h         |   2 +
 drivers/net/ethernet/intel/fm10k/fm10k_common.c  |   6 +-
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c |   2 +
 drivers/net/ethernet/intel/fm10k/fm10k_main.c    |  12 +-
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.h     |   2 +
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c     | 311 +++++++++++++----------
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c      |  38 ++-
 drivers/net/ethernet/intel/fm10k/fm10k_type.h    |   2 +
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c      |  12 +-
 9 files changed, 222 insertions(+), 165 deletions(-)

-- 
2.8.2.820.gd1c5f70


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-07-11 19:23   ` Singh, Krishneil K
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats Jacob Keller
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Return early from fm10k_down() when we are already down, since that
means another thread is either already finished or has started going
down, so shouldn't conflict with them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index e05aca9bef0e..610c313610c8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1601,7 +1601,8 @@ void fm10k_down(struct fm10k_intfc *interface)
 	int err;
 
 	/* signal that we are down to the interrupt handler and service task */
-	set_bit(__FM10K_DOWN, &interface->state);
+	if (test_and_set_bit(__FM10K_DOWN, &interface->state))
+		return;
 
 	/* call carrier off first to avoid false dev_watchdog timeouts */
 	netif_carrier_off(netdev);
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-07-11 19:24   ` Singh, Krishneil K
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 03/18] fm10k: prevent multiple threads updating statistics Jacob Keller
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

It's currently possible for fm10k_update_stats to be called during the
window when we go down and the rings are removed. This can result in
a null pointer dereference. In fm10k_get_stats64 we work around this by
using ACCESS_ONCE and a null pointer check inside the loop. Use this
same flow in the fm10k_update_stats to avoid the potential null pointer.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 610c313610c8..be0b7dea6e72 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -377,7 +377,10 @@ void fm10k_update_stats(struct fm10k_intfc *interface)
 
 	/* gather some stats to the interface struct that are per queue */
 	for (bytes = 0, pkts = 0, i = 0; i < interface->num_tx_queues; i++) {
-		struct fm10k_ring *tx_ring = interface->tx_ring[i];
+		struct fm10k_ring *tx_ring = READ_ONCE(interface->tx_ring[i]);
+
+		if (!tx_ring)
+			continue;
 
 		restart_queue += tx_ring->tx_stats.restart_queue;
 		tx_busy += tx_ring->tx_stats.tx_busy;
@@ -396,7 +399,10 @@ void fm10k_update_stats(struct fm10k_intfc *interface)
 
 	/* gather some stats to the interface struct that are per queue */
 	for (bytes = 0, pkts = 0, i = 0; i < interface->num_rx_queues; i++) {
-		struct fm10k_ring *rx_ring = interface->rx_ring[i];
+		struct fm10k_ring *rx_ring = READ_ONCE(interface->rx_ring[i]);
+
+		if (!rx_ring)
+			continue;
 
 		bytes += rx_ring->stats.bytes;
 		pkts += rx_ring->stats.packets;
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 03/18] fm10k: prevent multiple threads updating statistics
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 04/18] fm10k: Reset mailbox global interrupts Jacob Keller
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Also prevent updating stats while the interface is down. If we're
already updating stats, just return doing nothing. When we take the
device down, block stat updates until we come back up. This ensures that
we avoid tearing down rings when we're updating statistics, and prevents
updating statistics until we're up.

We can't re-use the __FM10K_DOWN for this because it wouldn't prevent
multiple threads from accessing statistics. Neither does it prevent the
case where we start updating stats and then start going down in another
thread.

The fm10k_get_stats64 is except from this, because it has a completely
different flow which does not suffer from the same issues as
fm10k_update_stats might.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h     |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index e98b86bf0ca1..c8d0817766bf 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -362,6 +362,7 @@ enum fm10k_state_t {
 	__FM10K_SERVICE_DISABLE,
 	__FM10K_MBX_LOCK,
 	__FM10K_LINK_DOWN,
+	__FM10K_UPDATING_STATS,
 };
 
 static inline void fm10k_mbx_lock(struct fm10k_intfc *interface)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index be0b7dea6e72..44393763fe77 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -372,6 +372,10 @@ void fm10k_update_stats(struct fm10k_intfc *interface)
 	u64 bytes, pkts;
 	int i;
 
+	/* ensure only one thread updates stats at a time */
+	if (test_and_set_bit(__FM10K_UPDATING_STATS, &interface->state))
+		return;
+
 	/* do not allow stats update via service task for next second */
 	interface->next_stats_update = jiffies + HZ;
 
@@ -449,6 +453,8 @@ void fm10k_update_stats(struct fm10k_intfc *interface)
 	/* Fill out the OS statistics structure */
 	net_stats->rx_errors = rx_errors;
 	net_stats->rx_dropped = interface->stats.nodesc_drop.count;
+
+	clear_bit(__FM10K_UPDATING_STATS, &interface->state);
 }
 
 /**
@@ -1572,6 +1578,9 @@ void fm10k_up(struct fm10k_intfc *interface)
 	/* configure interrupts */
 	hw->mac.ops.update_int_moderator(hw);
 
+	/* enable statistics capture again */
+	clear_bit(__FM10K_UPDATING_STATS, &interface->state);
+
 	/* clear down bit to indicate we are ready to go */
 	clear_bit(__FM10K_DOWN, &interface->state);
 
@@ -1629,6 +1638,10 @@ void fm10k_down(struct fm10k_intfc *interface)
 	/* capture stats one last time before stopping interface */
 	fm10k_update_stats(interface);
 
+	/* prevent updating statistics while we're down */
+	while (test_and_set_bit(__FM10K_UPDATING_STATS, &interface->state))
+		usleep_range(1000, 2000);
+
 	/* Disable DMA engine for Tx/Rx */
 	err = hw->mac.ops.stop_hw(hw);
 	if (err)
@@ -1757,6 +1770,7 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
 
 	/* Start off interface as being down */
 	set_bit(__FM10K_DOWN, &interface->state);
+	set_bit(__FM10K_UPDATING_STATS, &interface->state);
 
 	return 0;
 }
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 04/18] fm10k: Reset mailbox global interrupts
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (2 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 03/18] fm10k: prevent multiple threads updating statistics Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 05/18] fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING Jacob Keller
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

From: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>

When a data path reset is initiated, write control to the PCIE_GMBX is
yanked from the switch manager. The switch manager writes to this
register to clear mailbox global interrupt bits as part of its mailbox
interrupt handling routine. When the device recovers from the data path
reset and these bits are not cleared, it will prevent future mailbox
global interrupts from being triggered. Upon confirming that the device
has exited from a data path reset, clear these bits to ensure the proper
functioning of the mailbox global interrupt.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.h | 2 ++
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
index b7dbc8a84c05..35c1dbad1330 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
@@ -41,6 +41,8 @@ struct fm10k_mbx_info;
 #define FM10K_MBX_ACK_INTERRUPT			0x00000010
 #define FM10K_MBX_INTERRUPT_ENABLE		0x00000020
 #define FM10K_MBX_INTERRUPT_DISABLE		0x00000040
+#define FM10K_MBX_GLOBAL_REQ_INTERRUPT		0x00000200
+#define FM10K_MBX_GLOBAL_ACK_INTERRUPT		0x00000400
 #define FM10K_MBICR(_n)		((_n) + 0x18840)
 #define FM10K_GMBX		0x18842
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index dc75507c9926..69e2c822db00 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -77,6 +77,10 @@ static s32 fm10k_reset_hw_pf(struct fm10k_hw *hw)
 	if (!(reg & FM10K_IP_NOTINRESET))
 		err = FM10K_ERR_RESET_FAILED;
 
+	/* Reset mailbox global interrupts */
+	reg = FM10K_MBX_GLOBAL_REQ_INTERRUPT | FM10K_MBX_GLOBAL_ACK_INTERRUPT;
+	fm10k_write_reg(hw, FM10K_GMBX, reg);
+
 out:
 	return err;
 }
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 05/18] fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (3 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 04/18] fm10k: Reset mailbox global interrupts Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 06/18] fm10k: perform data path reset even when switch is not ready Jacob Keller
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Don't report FM10K_ERR_REQUESTS_PENDING when we fail to disable queues
within the timeout. This can occur due to a hardware Tx hang, or when
the switch ethernet fabric is resetting while we are transmitting
traffic. It can sometimes take up to 500ms before the Tx DMA engine
gives up. Instead, just skip the DMA engine check and perform
a data-path reset anyways. Add a statistic counter to keep track of the
number of resets occurring while we have pending DMA on the rings.

In order to prevent having to re-assign err to 0, re-order the
last few items of the reset_hw_pf function so that we don't perform
"return err" at the end.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c |  2 ++
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c      | 24 ++++++++++++++----------
 drivers/net/ethernet/intel/fm10k/fm10k_type.h    |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_vf.c      | 12 +++++++-----
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 9b5195435c87..c04cbe9c9f7c 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -76,6 +76,8 @@ static const struct fm10k_stats fm10k_gstrings_global_stats[] = {
 	FM10K_STAT("mac_rules_used", hw.swapi.mac.used),
 	FM10K_STAT("mac_rules_avail", hw.swapi.mac.avail),
 
+	FM10K_STAT("reset_while_pending", hw.mac.reset_while_pending),
+
 	FM10K_STAT("tx_hang_count", tx_timeout_count),
 };
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 69e2c822db00..7fbd94ba6745 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -51,8 +51,12 @@ static s32 fm10k_reset_hw_pf(struct fm10k_hw *hw)
 
 	/* shut down all rings */
 	err = fm10k_disable_queues_generic(hw, FM10K_MAX_QUEUES);
-	if (err)
+	if (err == FM10K_ERR_REQUESTS_PENDING) {
+		hw->mac.reset_while_pending++;
+		goto force_reset;
+	} else if (err) {
 		return err;
+	}
 
 	/* Verify that DMA is no longer active */
 	reg = fm10k_read_reg(hw, FM10K_DMA_CTRL);
@@ -62,27 +66,27 @@ static s32 fm10k_reset_hw_pf(struct fm10k_hw *hw)
 	/* verify the switch is ready for reset */
 	reg = fm10k_read_reg(hw, FM10K_DMA_CTRL2);
 	if (!(reg & FM10K_DMA_CTRL2_SWITCH_READY))
-		goto out;
+		return FM10K_ERR_DMA_PENDING;
 
+force_reset:
 	/* Inititate data path reset */
-	reg |= FM10K_DMA_CTRL_DATAPATH_RESET;
+	reg = FM10K_DMA_CTRL_DATAPATH_RESET;
 	fm10k_write_reg(hw, FM10K_DMA_CTRL, reg);
 
 	/* Flush write and allow 100us for reset to complete */
 	fm10k_write_flush(hw);
 	udelay(FM10K_RESET_TIMEOUT);
 
-	/* Verify we made it out of reset */
-	reg = fm10k_read_reg(hw, FM10K_IP);
-	if (!(reg & FM10K_IP_NOTINRESET))
-		err = FM10K_ERR_RESET_FAILED;
-
 	/* Reset mailbox global interrupts */
 	reg = FM10K_MBX_GLOBAL_REQ_INTERRUPT | FM10K_MBX_GLOBAL_ACK_INTERRUPT;
 	fm10k_write_reg(hw, FM10K_GMBX, reg);
 
-out:
-	return err;
+	/* Verify we made it out of reset */
+	reg = fm10k_read_reg(hw, FM10K_IP);
+	if (!(reg & FM10K_IP_NOTINRESET))
+		return FM10K_ERR_RESET_FAILED;
+
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index b8bc06183720..1d65ad85d72e 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -562,6 +562,7 @@ struct fm10k_mac_info {
 	bool tx_ready;
 	u32 dglort_map;
 	u8 itr_scale;
+	u64 reset_while_pending;
 };
 
 struct fm10k_swapi_table_info {
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index 3b06685ea63b..337ba65a9411 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -34,7 +34,7 @@ static s32 fm10k_stop_hw_vf(struct fm10k_hw *hw)
 
 	/* we need to disable the queues before taking further steps */
 	err = fm10k_stop_hw_generic(hw);
-	if (err)
+	if (err && err != FM10K_ERR_REQUESTS_PENDING)
 		return err;
 
 	/* If permanent address is set then we need to restore it */
@@ -67,7 +67,7 @@ static s32 fm10k_stop_hw_vf(struct fm10k_hw *hw)
 		fm10k_write_reg(hw, FM10K_TDLEN(i), tdlen);
 	}
 
-	return 0;
+	return err;
 }
 
 /**
@@ -83,7 +83,9 @@ static s32 fm10k_reset_hw_vf(struct fm10k_hw *hw)
 
 	/* shut down queues we own and reset DMA configuration */
 	err = fm10k_stop_hw_vf(hw);
-	if (err)
+	if (err == FM10K_ERR_REQUESTS_PENDING)
+		hw->mac.reset_while_pending++;
+	else if (err)
 		return err;
 
 	/* Inititate VF reset */
@@ -96,9 +98,9 @@ static s32 fm10k_reset_hw_vf(struct fm10k_hw *hw)
 	/* Clear reset bit and verify it was cleared */
 	fm10k_write_reg(hw, FM10K_VFCTRL, 0);
 	if (fm10k_read_reg(hw, FM10K_VFCTRL) & FM10K_VFCTRL_RST)
-		err = FM10K_ERR_RESET_FAILED;
+		return FM10K_ERR_RESET_FAILED;
 
-	return err;
+	return 0;
 }
 
 /**
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 06/18] fm10k: perform data path reset even when switch is not ready
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (4 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 05/18] fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 07/18] fm10k: use actual hardware registers when checking for pending Tx Jacob Keller
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

A while ago, an additional check for the switch being ready was added to
reset_hw. A recent refactor accidentally made this check return an error
code on failure which caused fm10k_probe to fail when the switch wasn't
brought up first. The original reasoning for the check was to prevent
additional data path reset when the fabric wasn't ready yet. However,
there isn't a compelling reason to keep the check, as the data path
reset will restore hardware to a known good state. Remove the check and
perform the data path reset regardless of the switch manager state.

An alternative fix is to return FM10K_SUCCESS instead, and bypass the
actual data path reset. This should be fine as we will perform
a reset_hw once the switch is active. However, since data path reset
will reset many parts of the hardware it seems better to just perform
the reset regardless of switch state.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 7fbd94ba6745..23f3566b17fc 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -63,11 +63,6 @@ static s32 fm10k_reset_hw_pf(struct fm10k_hw *hw)
 	if (reg & (FM10K_DMA_CTRL_TX_ACTIVE | FM10K_DMA_CTRL_RX_ACTIVE))
 		return FM10K_ERR_DMA_PENDING;
 
-	/* verify the switch is ready for reset */
-	reg = fm10k_read_reg(hw, FM10K_DMA_CTRL2);
-	if (!(reg & FM10K_DMA_CTRL2_SWITCH_READY))
-		return FM10K_ERR_DMA_PENDING;
-
 force_reset:
 	/* Inititate data path reset */
 	reg = FM10K_DMA_CTRL_DATAPATH_RESET;
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 07/18] fm10k: use actual hardware registers when checking for pending Tx
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (5 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 06/18] fm10k: perform data path reset even when switch is not ready Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 08/18] fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING Jacob Keller
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index a9ccc1eb3ea4..c6a464551577 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1130,9 +1130,11 @@ static u64 fm10k_get_tx_completed(struct fm10k_ring *ring)
 
 static u64 fm10k_get_tx_pending(struct fm10k_ring *ring)
 {
-	/* use SW head and tail until we have real hardware */
-	u32 head = ring->next_to_clean;
-	u32 tail = ring->next_to_use;
+	struct fm10k_intfc *interface = ring->q_vector->interface;
+	struct fm10k_hw *hw = &interface->hw;
+
+	u32 head = fm10k_read_reg(hw, FM10K_TDH(ring->reg_idx));
+	u32 tail = fm10k_read_reg(hw, FM10K_TDT(ring->reg_idx));
 
 	return ((head <= tail) ? tail : tail + ring->count) - head;
 }
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 08/18] fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (6 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 07/18] fm10k: use actual hardware registers when checking for pending Tx Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw Jacob Keller
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

When stop_hw() routine fails with FM10K_ERR_REQUESTS_PENDING, this
indicates that the Tx or Rx queues did not shutdown within the time
limit. Print a more suitable message at the dev_info level instead of
dev_err.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 44393763fe77..4dfd1284a8de 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1644,7 +1644,10 @@ void fm10k_down(struct fm10k_intfc *interface)
 
 	/* Disable DMA engine for Tx/Rx */
 	err = hw->mac.ops.stop_hw(hw);
-	if (err)
+	if (err == FM10K_ERR_REQUESTS_PENDING)
+		dev_info(&interface->pdev->dev,
+			 "due to pending requests hw was not shut down gracefully\n");
+	else if (err)
 		dev_err(&interface->pdev->dev, "stop_hw failed: %d\n", err);
 
 	/* free any buffers still on the rings */
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (7 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 08/18] fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-06 17:38   ` Keller, Jacob E
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 10/18] fm10k: split fm10k_reinit into two functions Jacob Keller
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

It turns out that VFs also suffer from the same queue issue as PFs, and
can't simply perform a datapath reset. Add a looping delay for queueus
which will sleep and then check the queues to see if they are drained
before calling stop_hw. If they take longer than 50 loops (~500ms,
longest delay I found necessary while testing), issue a dev_err message
and continue with the reset. For PFs, this should trigger a datapath
reset and a recovery. For VFs this will likely cause the VF to be stuck
and need re-creating. Unfortunately there isn't anything else the driver
can do in response.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h      |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |  2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 37 +++++++++++++++++++++++----
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index c8d0817766bf..c4cf08dcf5af 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -458,6 +458,7 @@ __be16 fm10k_tx_encap_offload(struct sk_buff *skb);
 netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
 				  struct fm10k_ring *tx_ring);
 void fm10k_tx_timeout_reset(struct fm10k_intfc *interface);
+u64 fm10k_get_tx_pending(struct fm10k_ring *ring);
 bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring);
 void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count);
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index c6a464551577..c85fc98945fa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1128,7 +1128,7 @@ static u64 fm10k_get_tx_completed(struct fm10k_ring *ring)
 	return ring->stats.packets;
 }
 
-static u64 fm10k_get_tx_pending(struct fm10k_ring *ring)
+u64 fm10k_get_tx_pending(struct fm10k_ring *ring)
 {
 	struct fm10k_intfc *interface = ring->q_vector->interface;
 	struct fm10k_hw *hw = &interface->hw;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 4dfd1284a8de..469122cdb7db 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1613,7 +1613,7 @@ void fm10k_down(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
 	struct fm10k_hw *hw = &interface->hw;
-	int err;
+	int err, i = 0, count = 0;
 
 	/* signal that we are down to the interrupt handler and service task */
 	if (test_and_set_bit(__FM10K_DOWN, &interface->state))
@@ -1629,9 +1629,36 @@ void fm10k_down(struct fm10k_intfc *interface)
 	/* reset Rx filters */
 	fm10k_reset_rx_state(interface);
 
-	/* allow 10ms for device to quiesce */
-	usleep_range(10000, 20000);
+	/* skip waiting for TX DMA if we lost PCIe link */
+	if (FM10K_REMOVED(hw->hw_addr))
+		goto skip_tx_dma_drain;
 
+	/* in some circumstances it can take up to 500ms for the Tx queues to
+	 * quiesce and stop the Tx DMA engine. To avoid forcing a long sleep,
+	 * we'll repeat checking the Tx queues every few milliseconds. If we
+	 * exceed too long a delay we'll log an error message. For PF, this
+	 * should result in a data path reset. For VF, this may result in the
+	 * VF being disabled, as there is no equivalent data path reset.
+	 */
+#define TX_DMA_DRAIN_RETRIES 50
+	for (count = 0; count < TX_DMA_DRAIN_RETRIES; count++) {
+		usleep_range(10000, 20000);
+
+		/* start checking@the last ring to have pending Tx */
+		for (; i < interface->num_tx_queues; i++)
+			if (fm10k_get_tx_pending(interface->tx_ring[i]))
+				break;
+
+		/* if all the queues are drained, we can break now */
+		if (i == interface->num_tx_queues)
+			break;
+	}
+
+	if (count == TX_DMA_DRAIN_RETRIES)
+		dev_err(&interface->pdev->dev,
+			"Tx queues failed to drain after ~500ms. Tx DMA is probably hung.\n");
+
+skip_tx_dma_drain:
 	/* disable polling routines */
 	fm10k_napi_disable_all(interface);
 
@@ -1645,8 +1672,8 @@ void fm10k_down(struct fm10k_intfc *interface)
 	/* Disable DMA engine for Tx/Rx */
 	err = hw->mac.ops.stop_hw(hw);
 	if (err == FM10K_ERR_REQUESTS_PENDING)
-		dev_info(&interface->pdev->dev,
-			 "due to pending requests hw was not shut down gracefully\n");
+		dev_err(&interface->pdev->dev,
+			"due to pending requests hw was not shut down gracefully\n");
 	else if (err)
 		dev_err(&interface->pdev->dev, "stop_hw failed: %d\n", err);
 
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 10/18] fm10k: split fm10k_reinit into two functions
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (8 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 11/18] fm10k: implement prepare_suspend and handle_resume Jacob Keller
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

There are several flows in the driver which perform the similar function
of tearing down software and restoring software to recover from certain
errors or PCIe events, including:

  * fm10k_reinit
  * fm10k_suspend/resume
  * fm10k_io_error_detected/fm10k_io_resume

In addition, we want to implement a .reset_notify() handler as well
which will also perform similar function.

Rework how the driver codes reset and resume flows by separating out the
reinit logic into two functions "fm10k_prepare_for_reset" and
"fm10k_handle_reset". This first step will allow us to re-use this
functionality in the similar blocks of code instead of re-coding the
same sequence of events slightly different.

The end result should be more maintainable and correct, fixing several
inconsistencies with the work flow.

The new functions expect to take the rtnl_lock() themselves, and it does
have the unfortunate side effect of having the reinit flow take then
release then take the rtnl_lock. However, this minor downside is
outweighted by the benefits of code reduction and reducing needless
difference between these flows.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 33 +++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 469122cdb7db..95a654adca84 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -136,11 +136,9 @@ static void fm10k_detach_subtask(struct fm10k_intfc *interface)
 	rtnl_unlock();
 }
 
-static void fm10k_reinit(struct fm10k_intfc *interface)
+static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
-	struct fm10k_hw *hw = &interface->hw;
-	int err;
 
 	WARN_ON(in_interrupt());
 
@@ -165,6 +163,17 @@ static void fm10k_reinit(struct fm10k_intfc *interface)
 	/* delay any future reset requests */
 	interface->last_reset = jiffies + (10 * HZ);
 
+	rtnl_unlock();
+}
+
+static int fm10k_handle_reset(struct fm10k_intfc *interface)
+{
+	struct net_device *netdev = interface->netdev;
+	struct fm10k_hw *hw = &interface->hw;
+	int err;
+
+	rtnl_lock();
+
 	/* reset and initialize the hardware so it is in a known state */
 	err = hw->mac.ops.reset_hw(hw);
 	if (err) {
@@ -185,7 +194,7 @@ static void fm10k_reinit(struct fm10k_intfc *interface)
 		goto reinit_err;
 	}
 
-	/* reassociate interrupts */
+	/* re-associate interrupts */
 	err = fm10k_mbx_request_irq(interface);
 	if (err)
 		goto err_mbx_irq;
@@ -219,7 +228,7 @@ static void fm10k_reinit(struct fm10k_intfc *interface)
 
 	clear_bit(__FM10K_RESETTING, &interface->state);
 
-	return;
+	return err;
 err_open:
 	fm10k_mbx_free_irq(interface);
 err_mbx_irq:
@@ -230,6 +239,20 @@ reinit_err:
 	rtnl_unlock();
 
 	clear_bit(__FM10K_RESETTING, &interface->state);
+
+	return err;
+}
+
+static void fm10k_reinit(struct fm10k_intfc *interface)
+{
+	int err;
+
+	fm10k_prepare_for_reset(interface);
+
+	err = fm10k_handle_reset(interface);
+	if (err)
+		dev_err(&interface->pdev->dev,
+			"fm10k_handle_reset failed: %d\n", err);
 }
 
 static void fm10k_reset_subtask(struct fm10k_intfc *interface)
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 11/18] fm10k: implement prepare_suspend and handle_resume
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (9 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 10/18] fm10k: split fm10k_reinit into two functions Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 12/18] fm10k: use common reset flow when handling io errors from PCI stack Jacob Keller
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Implement fm10k_prepare_suspend and fm10k_handle_resume functions which
abstract around the now existing fm10k_prepare_for_reset and
fm10k_handle_reset. The new functions also handle stopping the service
task, which is something that the original re-init flow does not need.

Every other location that does a suspend/resume type flow is expected to
use these functions, because otherwise they may have conflicts with the
running watchdog routines. This also has the effect of preventing
possible surprise remove events during handling of FLR events and PCIe
errors.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 38 ++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 95a654adca84..ab26bfa0a342 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2107,6 +2107,44 @@ static void fm10k_remove(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
+static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
+{
+	/* the watchdog task reads from registers, which might appear like
+	 * a surprise remove if the PCIe device is disabled while we're
+	 * stopped. We stop the watchdog task until after we resume software
+	 * activity.
+	 */
+	set_bit(__FM10K_SERVICE_DISABLE, &interface->state);
+	cancel_work_sync(&interface->service_task);
+
+	fm10k_prepare_for_reset(interface);
+}
+
+static int fm10k_handle_resume(struct fm10k_intfc *interface)
+{
+	struct fm10k_hw *hw = &interface->hw;
+	int err;
+
+	/* reset statistics starting values */
+	hw->mac.ops.rebind_hw_stats(hw, &interface->stats);
+
+	err = fm10k_handle_reset(interface);
+	if (err)
+		return err;
+
+	/* assume host is not ready, to prevent race with watchdog in case we
+	 * actually don't have connection to the switch
+	 */
+	interface->host_ready = false;
+	fm10k_watchdog_host_not_ready(interface);
+
+	/* clear the service task disable bit to allow service task to start */
+	clear_bit(__FM10K_SERVICE_DISABLE, &interface->state);
+	fm10k_service_event_schedule(interface);
+
+	return err;
+}
+
 #ifdef CONFIG_PM
 /**
  * fm10k_resume - Restore device to pre-sleep state
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 12/18] fm10k: use common reset flow when handling io errors from PCI stack
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (10 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 11/18] fm10k: implement prepare_suspend and handle_resume Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 13/18] fm10k: implement reset_notify handler for PCIe FLR events Jacob Keller
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Now that we have extracted the necessary steps for a split
suspend/resume flow, re-use these functions instead of using the current
open coded flow. This ensures that we don't miss any steps. It also
ensures that we have the correct driver states set.

Since we'll be handling all of the reset flow ourselves, we no longer
need to request a reset in the io_slot_reset() function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 60 ++++------------------------
 1 file changed, 7 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index ab26bfa0a342..0ab20d2c4ea6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2307,17 +2307,7 @@ static pci_ers_result_t fm10k_io_error_detected(struct pci_dev *pdev,
 	if (state == pci_channel_io_perm_failure)
 		return PCI_ERS_RESULT_DISCONNECT;
 
-	rtnl_lock();
-
-	if (netif_running(netdev))
-		fm10k_close(netdev);
-
-	fm10k_mbx_free_irq(interface);
-
-	/* free interrupts */
-	fm10k_clear_queueing_scheme(interface);
-
-	rtnl_unlock();
+	fm10k_prepare_suspend(interface);
 
 	/* Request a slot reset. */
 	return PCI_ERS_RESULT_NEED_RESET;
@@ -2331,7 +2321,6 @@ static pci_ers_result_t fm10k_io_error_detected(struct pci_dev *pdev,
  */
 static pci_ers_result_t fm10k_io_slot_reset(struct pci_dev *pdev)
 {
-	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
 	pci_ers_result_t result;
 
 	if (pci_enable_device_mem(pdev)) {
@@ -2349,12 +2338,6 @@ static pci_ers_result_t fm10k_io_slot_reset(struct pci_dev *pdev)
 
 		pci_wake_from_d3(pdev, false);
 
-		/* refresh hw_addr in case it was dropped */
-		interface->hw.hw_addr = interface->uc_addr;
-
-		interface->flags |= FM10K_FLAG_RESET_REQUESTED;
-		fm10k_service_event_schedule(interface);
-
 		result = PCI_ERS_RESULT_RECOVERED;
 	}
 
@@ -2374,44 +2357,15 @@ static void fm10k_io_resume(struct pci_dev *pdev)
 {
 	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
 	struct net_device *netdev = interface->netdev;
-	struct fm10k_hw *hw = &interface->hw;
-	int err = 0;
+	int err;
 
-	/* reset hardware to known state */
-	err = hw->mac.ops.init_hw(&interface->hw);
-	if (err) {
-		dev_err(&pdev->dev, "init_hw failed: %d\n", err);
-		return;
-	}
+	err = fm10k_handle_resume(interface);
 
-	/* reset statistics starting values */
-	hw->mac.ops.rebind_hw_stats(hw, &interface->stats);
-
-	rtnl_lock();
-
-	err = fm10k_init_queueing_scheme(interface);
-	if (err) {
-		dev_err(&interface->pdev->dev,
-			"init_queueing_scheme failed: %d\n", err);
-		goto unlock;
-	}
-
-	/* reassociate interrupts */
-	fm10k_mbx_request_irq(interface);
-
-	rtnl_lock();
-	if (netif_running(netdev))
-		err = fm10k_open(netdev);
-	rtnl_unlock();
-
-	/* final check of hardware state before registering the interface */
-	err = err ? : fm10k_hw_ready(interface);
-
-	if (!err)
+	if (err)
+		dev_warn(&pdev->dev,
+			 "fm10k_io_resume failed: %d\n", err);
+	else
 		netif_device_attach(netdev);
-
-unlock:
-	rtnl_unlock();
 }
 
 static const struct pci_error_handlers fm10k_err_handler = {
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 13/18] fm10k: implement reset_notify handler for PCIe FLR events
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (11 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 12/18] fm10k: use common reset flow when handling io errors from PCI stack Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 14/18] fm10k: use common flow for suspend and resume Jacob Keller
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

When a function level PCI reset is triggered using sysfs, it calls the
driver's .reset_notify error handler. Implement a handler based on the
now split fm10k_prepare_for_reset and fm10k_handle_reset functions, so
that we fully reset the driver when the PCI function level reset occurs.
This also ensures the reset is handled in a clean way by first disabling
all the driver bits first and then restoring them after the function
reset. Previously the stack simply performed a blind function reset and
our driver didn't take any part in the process.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 33 ++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 0ab20d2c4ea6..10fd3f689728 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2368,10 +2368,43 @@ static void fm10k_io_resume(struct pci_dev *pdev)
 		netif_device_attach(netdev);
 }
 
+/**
+ * fm10k_io_reset_notify - called when PCI function is reset
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the PCI function is reset such as from
+ * /sys/class/net/<enpX>/device/reset or similar. When prepare is true, it
+ * means we should prepare for a function reset. If prepare is false, it means
+ * the function reset just occurred.
+ */
+static void fm10k_io_reset_notify(struct pci_dev *pdev, bool prepare)
+{
+	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+	int err = 0;
+
+	if (prepare) {
+		/* warn incase we have any active VF devices */
+		if (pci_num_vf(pdev))
+			dev_warn(&pdev->dev,
+				 "PCIe FLR may cause issues for any active VF devices\n");
+
+		fm10k_prepare_suspend(interface);
+	} else {
+		err = fm10k_handle_resume(interface);
+	}
+
+	if (err) {
+		dev_warn(&pdev->dev,
+			 "fm10k_io_reset_notify failed: %d\n", err);
+		netif_device_detach(interface->netdev);
+	}
+}
+
 static const struct pci_error_handlers fm10k_err_handler = {
 	.error_detected = fm10k_io_error_detected,
 	.slot_reset = fm10k_io_slot_reset,
 	.resume = fm10k_io_resume,
+	.reset_notify = fm10k_io_reset_notify,
 };
 
 static struct pci_driver fm10k_driver = {
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 14/18] fm10k: use common flow for suspend and resume
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (12 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 13/18] fm10k: implement reset_notify handler for PCIe FLR events Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 15/18] fm10k: enable bus master after every reset Jacob Keller
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Continuing the effort to commonize the similar suspend/resume flows,
finish up by using the new fm10k_handle_suspand and fm10k_handle_resume
functions for the standard suspend/resume flow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 73 ++--------------------------
 1 file changed, 3 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 10fd3f689728..8f68636aab12 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2181,60 +2181,13 @@ static int fm10k_resume(struct pci_dev *pdev)
 	/* refresh hw_addr in case it was dropped */
 	hw->hw_addr = interface->uc_addr;
 
-	/* reset hardware to known state */
-	err = hw->mac.ops.init_hw(&interface->hw);
-	if (err) {
-		dev_err(&pdev->dev, "init_hw failed: %d\n", err);
+	err = fm10k_handle_resume(interface);
+	if (err)
 		return err;
-	}
-
-	/* reset statistics starting values */
-	hw->mac.ops.rebind_hw_stats(hw, &interface->stats);
-
-	rtnl_lock();
-
-	err = fm10k_init_queueing_scheme(interface);
-	if (err)
-		goto err_queueing_scheme;
-
-	err = fm10k_mbx_request_irq(interface);
-	if (err)
-		goto err_mbx_irq;
-
-	err = fm10k_hw_ready(interface);
-	if (err)
-		goto err_open;
-
-	err = netif_running(netdev) ? fm10k_open(netdev) : 0;
-	if (err)
-		goto err_open;
-
-	rtnl_unlock();
-
-	/* assume host is not ready, to prevent race with watchdog in case we
-	 * actually don't have connection to the switch
-	 */
-	interface->host_ready = false;
-	fm10k_watchdog_host_not_ready(interface);
-
-	/* clear the service task disable bit to allow service task to start */
-	clear_bit(__FM10K_SERVICE_DISABLE, &interface->state);
-	fm10k_service_event_schedule(interface);
-
-	/* restore SR-IOV interface */
-	fm10k_iov_resume(pdev);
 
 	netif_device_attach(netdev);
 
 	return 0;
-err_open:
-	fm10k_mbx_free_irq(interface);
-err_mbx_irq:
-	fm10k_clear_queueing_scheme(interface);
-err_queueing_scheme:
-	rtnl_unlock();
-
-	return err;
 }
 
 /**
@@ -2254,27 +2207,7 @@ static int fm10k_suspend(struct pci_dev *pdev,
 
 	netif_device_detach(netdev);
 
-	fm10k_iov_suspend(pdev);
-
-	/* the watchdog tasks may read registers, which will appear like a
-	 * surprise-remove event once the PCI device is disabled. This will
-	 * cause us to close the netdevice, so we don't retain the open/closed
-	 * state post-resume. Prevent this by disabling the service task while
-	 * suspended, until we actually resume.
-	 */
-	set_bit(__FM10K_SERVICE_DISABLE, &interface->state);
-	cancel_work_sync(&interface->service_task);
-
-	rtnl_lock();
-
-	if (netif_running(netdev))
-		fm10k_close(netdev);
-
-	fm10k_mbx_free_irq(interface);
-
-	fm10k_clear_queueing_scheme(interface);
-
-	rtnl_unlock();
+	fm10k_prepare_suspend(interface);
 
 	err = pci_save_state(pdev);
 	if (err)
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 15/18] fm10k: enable bus master after every reset
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (13 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 14/18] fm10k: use common flow for suspend and resume Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 16/18] fm10k: check if PCIe link is restored Jacob Keller
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

If an FLR occurs, VF devices will be knocked out of bus master mode, and
the driver will be unable to recover from the reset properly, resulting
in malicious driver events and an infinite reset loop. In the normal
case, the bus master mode will already be enabled and this call will
essentially be a no-op. Since we're doing this every reset, it is
possible we could remove the other calls to pci_set_master() but it
seems not harmful to just leave them in place.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 8f68636aab12..8eaf6479b9dd 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -174,6 +174,8 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
 
 	rtnl_lock();
 
+	pci_set_master(interface->pdev);
+
 	/* reset and initialize the hardware so it is in a known state */
 	err = hw->mac.ops.reset_hw(hw);
 	if (err) {
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 16/18] fm10k: check if PCIe link is restored
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (14 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 15/18] fm10k: enable bus master after every reset Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 17/18] fm10k: implement request_lport_map pointer Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 18/18] fm10k: bump version number Jacob Keller
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Sometimes, a VF driver will lose PCIe address access, such as due to
a PF FLR event. In fm10k_detach_subtask, poll and check whether the
PCIe register space is active again and restore the device when it has.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 8eaf6479b9dd..358fdb0d990b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -123,11 +123,24 @@ static void fm10k_service_timer(unsigned long data)
 static void fm10k_detach_subtask(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
+	u32 __iomem *hw_addr;
+	u32 value;
 
 	/* do nothing if device is still present or hw_addr is set */
 	if (netif_device_present(netdev) || interface->hw.hw_addr)
 		return;
 
+	/* check the real address space to see if we've recovered */
+	hw_addr = READ_ONCE(interface->uc_addr);
+	value = readl(hw_addr);
+	if ((~value)) {
+		interface->hw.hw_addr = interface->uc_addr;
+		netif_device_attach(netdev);
+		interface->flags |= FM10K_FLAG_RESET_REQUESTED;
+		netdev_warn(netdev, "PCIe link restored, device now attached\n");
+		return;
+	}
+
 	rtnl_lock();
 
 	if (netif_running(netdev))
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 17/18] fm10k: implement request_lport_map pointer
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (15 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 16/18] fm10k: check if PCIe link is restored Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 18/18] fm10k: bump version number Jacob Keller
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

If the fm10k interface is brought up, but the switch manager software is
not running, the driver will continuously request the lport map every
few seconds in the base driver watchdog routine. Eventually after
several minutes the switch mailbox Tx fifo will fill up and the mailbox
will timeout, resulting in a reset. This reset will appear as if for no
reason, and occurs regularly every few minutes until the switch manager
software is loaded.

Prevent this from happening by only requesting the lport map after we've
verified the switch mailbox is tx_ready. In order to simplify code logic
and reduce code duplication, implement this as a new function pointer
"mac.ops.request_lport_map" which the VF will not implement. Otherwise,
we have to duplicate the tx_ready check outside of
fm10k_get_host_state_generic, or re-implement most of
fm10k_get_host_state_generic in the pf version.

The resulting code is simpler and easier to understand, and prevents the
PF from continuously requesting lport map and filling the Tx fifo of
a switch mailbox that isn't ready.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_common.c |  6 +++++-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c     | 15 +++------------
 drivers/net/ethernet/intel/fm10k/fm10k_type.h   |  1 +
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.c b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
index 5bbf19cfe29b..d6baaea8bc7c 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
@@ -519,8 +519,12 @@ s32 fm10k_get_host_state_generic(struct fm10k_hw *hw, bool *host_ready)
 		goto out;
 
 	/* interface cannot receive traffic without logical ports */
-	if (mac->dglort_map == FM10K_DGLORTMAP_NONE)
+	if (mac->dglort_map == FM10K_DGLORTMAP_NONE) {
+		if (hw->mac.ops.request_lport_map)
+			ret_val = hw->mac.ops.request_lport_map(hw);
+
 		goto out;
+	}
 
 	/* if we passed all the tests above then the switch is ready and we no
 	 * longer need to check for link
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 23f3566b17fc..682299dd0ce4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1622,25 +1622,15 @@ static s32 fm10k_request_lport_map_pf(struct fm10k_hw *hw)
  **/
 static s32 fm10k_get_host_state_pf(struct fm10k_hw *hw, bool *switch_ready)
 {
-	s32 ret_val = 0;
 	u32 dma_ctrl2;
 
 	/* verify the switch is ready for interaction */
 	dma_ctrl2 = fm10k_read_reg(hw, FM10K_DMA_CTRL2);
 	if (!(dma_ctrl2 & FM10K_DMA_CTRL2_SWITCH_READY))
-		goto out;
+		return 0;
 
 	/* retrieve generic host state info */
-	ret_val = fm10k_get_host_state_generic(hw, switch_ready);
-	if (ret_val)
-		goto out;
-
-	/* interface cannot receive traffic without logical ports */
-	if (hw->mac.dglort_map == FM10K_DGLORTMAP_NONE)
-		ret_val = fm10k_request_lport_map_pf(hw);
-
-out:
-	return ret_val;
+	return fm10k_get_host_state_generic(hw, switch_ready);
 }
 
 /* This structure defines the attibutes to be parsed below */
@@ -1816,6 +1806,7 @@ static const struct fm10k_mac_ops mac_ops_pf = {
 	.set_dma_mask		= fm10k_set_dma_mask_pf,
 	.get_fault		= fm10k_get_fault_pf,
 	.get_host_state		= fm10k_get_host_state_pf,
+	.request_lport_map	= fm10k_request_lport_map_pf,
 };
 
 static const struct fm10k_iov_ops iov_ops_pf = {
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index 1d65ad85d72e..f4e75c498287 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -526,6 +526,7 @@ struct fm10k_mac_ops {
 	s32 (*stop_hw)(struct fm10k_hw *);
 	s32 (*get_bus_info)(struct fm10k_hw *);
 	s32 (*get_host_state)(struct fm10k_hw *, bool *);
+	s32 (*request_lport_map)(struct fm10k_hw *);
 	s32 (*update_vlan)(struct fm10k_hw *, u32, u8, bool);
 	s32 (*read_mac_addr)(struct fm10k_hw *);
 	s32 (*update_uc_addr)(struct fm10k_hw *, u16, const u8 *,
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 18/18] fm10k: bump version number
  2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
                   ` (16 preceding siblings ...)
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 17/18] fm10k: implement request_lport_map pointer Jacob Keller
@ 2016-06-03 22:42 ` Jacob Keller
  17 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-06-03 22:42 UTC (permalink / raw)
  To: intel-wired-lan

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index c85fc98945fa..5a37dd579e75 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -28,7 +28,7 @@
 
 #include "fm10k.h"
 
-#define DRV_VERSION	"0.19.3-k"
+#define DRV_VERSION	"0.21.1-k"
 #define DRV_SUMMARY	"Intel(R) Ethernet Switch Host Interface Driver"
 const char fm10k_driver_version[] = DRV_VERSION;
 char fm10k_driver_name[] = "fm10k";
-- 
2.8.2.820.gd1c5f70


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw Jacob Keller
@ 2016-06-06 17:38   ` Keller, Jacob E
  0 siblings, 0 replies; 22+ messages in thread
From: Keller, Jacob E @ 2016-06-06 17:38 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 2016-06-03 at 15:42 -0700, Jacob Keller wrote:
> It turns out that VFs also suffer from the same queue issue as PFs,
> and
> can't simply perform a datapath reset. Add a looping delay for
> queueus
> which will sleep and then check the queues to see if they are drained
> before calling stop_hw. If they take longer than 50 loops (~500ms,
> longest delay I found necessary while testing), issue a dev_err
> message
> and continue with the reset. For PFs, this should trigger a datapath
> reset and a recovery. For VFs this will likely cause the VF to be
> stuck
> and need re-creating. Unfortunately there isn't anything else the
> driver
> can do in response.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
It looks like after some testing, this patch may need changes.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set Jacob Keller
@ 2016-07-11 19:23   ` Singh, Krishneil K
  0 siblings, 0 replies; 22+ messages in thread
From: Singh, Krishneil K @ 2016-07-11 19:23 UTC (permalink / raw)
  To: intel-wired-lan



-----Original Message-----
From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On Behalf Of Jacob Keller
Sent: Friday, June 3, 2016 3:42 PM
To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
Subject: [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set

Return early from fm10k_down() when we are already down, since that means another thread is either already finished or has started going down, so shouldn't conflict with them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---

Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats
  2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats Jacob Keller
@ 2016-07-11 19:24   ` Singh, Krishneil K
  0 siblings, 0 replies; 22+ messages in thread
From: Singh, Krishneil K @ 2016-07-11 19:24 UTC (permalink / raw)
  To: intel-wired-lan



-----Original Message-----
From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On Behalf Of Jacob Keller
Sent: Friday, June 3, 2016 3:42 PM
To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
Subject: [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats

It's currently possible for fm10k_update_stats to be called during the window when we go down and the rings are removed. This can result in a null pointer dereference. In fm10k_get_stats64 we work around this by using ACCESS_ONCE and a null pointer check inside the loop. Use this same flow in the fm10k_update_stats to avoid the potential null pointer.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---


Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-07-11 19:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03 22:42 [Intel-wired-lan] [PATCH v1 00/18] fm10k reset + suspend/resume changes Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 01/18] fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set Jacob Keller
2016-07-11 19:23   ` Singh, Krishneil K
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 02/18] fm10k: avoid possible null pointer dereference in fm10k_update_stats Jacob Keller
2016-07-11 19:24   ` Singh, Krishneil K
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 03/18] fm10k: prevent multiple threads updating statistics Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 04/18] fm10k: Reset mailbox global interrupts Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 05/18] fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 06/18] fm10k: perform data path reset even when switch is not ready Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 07/18] fm10k: use actual hardware registers when checking for pending Tx Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 08/18] fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 09/18] fm10k: wait for queues to drain fully before stop_hw Jacob Keller
2016-06-06 17:38   ` Keller, Jacob E
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 10/18] fm10k: split fm10k_reinit into two functions Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 11/18] fm10k: implement prepare_suspend and handle_resume Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 12/18] fm10k: use common reset flow when handling io errors from PCI stack Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 13/18] fm10k: implement reset_notify handler for PCIe FLR events Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 14/18] fm10k: use common flow for suspend and resume Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 15/18] fm10k: enable bus master after every reset Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 16/18] fm10k: check if PCIe link is restored Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 17/18] fm10k: implement request_lport_map pointer Jacob Keller
2016-06-03 22:42 ` [Intel-wired-lan] [PATCH v1 18/18] fm10k: bump version number Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.