netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 0/2] Adapter not recovery from EEH error injection
@ 2014-05-27 19:11 wenxiong
  2014-05-27 19:11 ` [PATCH RESEND 1/2] bnx2x: " wenxiong
  2014-05-27 19:11 ` [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong
  0 siblings, 2 replies; 4+ messages in thread
From: wenxiong @ 2014-05-27 19:11 UTC (permalink / raw)
  To: davem; +Cc: ariel.elior, netdev

Re-send the patches with new maintainers for bnx2x device driver.

Thanks,
Wendy
-- 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND 1/2] bnx2x: Adapter not recovery from EEH error injection
  2014-05-27 19:11 [PATCH RESEND 0/2] Adapter not recovery from EEH error injection wenxiong
@ 2014-05-27 19:11 ` wenxiong
  2014-05-27 19:11 ` [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong
  1 sibling, 0 replies; 4+ messages in thread
From: wenxiong @ 2014-05-27 19:11 UTC (permalink / raw)
  To: davem; +Cc: ariel.elior, netdev, Wen Xiong

[-- Attachment #1: bnx2x_eeh_fix --]
[-- Type: text/plain, Size: 985 bytes --]

When injecting EEH error to bnx2x adapter, adapter couldn't be recovery
and caused recursive EEH errors. The patch fixes the issue.

Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
===================================================================
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c	2014-05-22 18:42:48.000000000 -0500
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c	2014-05-22 18:44:36.757765539 -0500
@@ -13279,8 +13279,8 @@ static int bnx2x_eeh_nic_unload(struct b
 	netdev_reset_tc(bp->dev);
 
 	del_timer_sync(&bp->timer);
-	cancel_delayed_work(&bp->sp_task);
-	cancel_delayed_work(&bp->period_task);
+	cancel_delayed_work_sync(&bp->sp_task);
+	cancel_delayed_work_sync(&bp->period_task);
 
 	spin_lock_bh(&bp->stats_lock);
 	bp->stats_state = STATS_STATE_DISABLED;

-- 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery
  2014-05-27 19:11 [PATCH RESEND 0/2] Adapter not recovery from EEH error injection wenxiong
  2014-05-27 19:11 ` [PATCH RESEND 1/2] bnx2x: " wenxiong
@ 2014-05-27 19:11 ` wenxiong
  2014-05-28  8:38   ` Dmitry Kravkov
  1 sibling, 1 reply; 4+ messages in thread
From: wenxiong @ 2014-05-27 19:11 UTC (permalink / raw)
  To: davem; +Cc: ariel.elior, netdev, Milton Miller, Wen Xiong

[-- Attachment #1: bnx2x_rmb_fix --]
[-- Type: text/plain, Size: 1818 bytes --]

A rmb() is required to ensure that the CQE is not read before it
is written by the adapter DMA.  PCI ordering rules will make sure
the other fields are written before the marker at the end of struct
eth_fast_path_rx_cqe but without rmb() a weakly ordered processor can
process stale data.

Without the barrier we have observed various crashes including
bnx2x_tpa_start being called on queues not stopped (resulting in message
start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
===================================================================
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c	2014-05-23 10:34:21.000000000 -0500
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c	2014-05-27 13:51:12.067764759 -0500
@@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas
 		bd_prod = RX_BD(bd_prod);
 		bd_cons = RX_BD(bd_cons);
 
+		/* A rmb() is required to ensure that the CQE is not read
+		 * before it is written by the adapter DMA.  PCI ordering
+		 * rules will make sure the other fields are written before
+		 * the marker at the end of struct eth_fast_path_rx_cqe
+		 * but without rmb() a weakly ordered processor can process
+		 * stale data.  Without the barrier we have observed various
+		 * crashes including bnx2x_tpa_start being called on queues
+		 * not stopped (resulting in message start of bin not in
+		 * stop) and NULL pointer exceptions from bnx2x_rx_int.
+		*/
+		rmb();
+
 		cqe_fp_flags = cqe_fp->type_error_flags;
 		cqe_fp_type = cqe_fp_flags & ETH_FAST_PATH_RX_CQE_TYPE;
 

-- 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery
  2014-05-27 19:11 ` [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong
@ 2014-05-28  8:38   ` Dmitry Kravkov
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Kravkov @ 2014-05-28  8:38 UTC (permalink / raw)
  To: wenxiong@linux.vnet.ibm.com, David Miller
  Cc: Ariel Elior, netdev, Milton Miller

Hi Wen
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com
> Sent: Tuesday, May 27, 2014 10:11 PM
> To: David Miller
> Cc: Ariel Elior; netdev; Milton Miller; Wen Xiong
> Subject: [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare
> after EEH recovery
>
> A rmb() is required to ensure that the CQE is not read before it is written by
> the adapter DMA.  PCI ordering rules will make sure the other fields are
> written before the marker at the end of struct eth_fast_path_rx_cqe but
> without rmb() a weakly ordered processor can process stale data.
>
> Without the barrier we have observed various crashes including
> bnx2x_tpa_start being called on queues not stopped (resulting in message
> start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int.
>
> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> ===================================================================
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-23
> 10:34:21.000000000 -0500
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-27
> 13:51:12.067764759 -0500
> @@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas
>               bd_prod = RX_BD(bd_prod);
>               bd_cons = RX_BD(bd_cons);
>
> +             /* A rmb() is required to ensure that the CQE is not read
> +              * before it is written by the adapter DMA.  PCI ordering
> +              * rules will make sure the other fields are written before
> +              * the marker at the end of struct eth_fast_path_rx_cqe
> +              * but without rmb() a weakly ordered processor can process
> +              * stale data.  Without the barrier we have observed various
> +              * crashes including bnx2x_tpa_start being called on queues
> +              * not stopped (resulting in message start of bin not in
> +              * stop) and NULL pointer exceptions from bnx2x_rx_int.
> +             */
Can you please drop third sentence from the comment or rephrase it in more generic way like:
Without the barrier TPA state-machine might enter inconsistent state and kernel stack might be provided with incorrect packet description - these lead to various kernel crashes.

Thanks
Dmitry
> +             rmb();
> +
>               cqe_fp_flags = cqe_fp->type_error_flags;
>               cqe_fp_type = cqe_fp_flags &
> ETH_FAST_PATH_RX_CQE_TYPE;
>
>
> --
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

________________________________

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-05-28  8:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-27 19:11 [PATCH RESEND 0/2] Adapter not recovery from EEH error injection wenxiong
2014-05-27 19:11 ` [PATCH RESEND 1/2] bnx2x: " wenxiong
2014-05-27 19:11 ` [PATCH RESEND 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong
2014-05-28  8:38   ` Dmitry Kravkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).