* [PATCH V2 0/2] Adapter not recovery from EEH error injection @ 2014-05-28 15:56 wenxiong 2014-05-28 15:56 ` [PATCH V2 1/2] bnx2x: " wenxiong 2014-05-28 15:56 ` [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong 0 siblings, 2 replies; 7+ messages in thread From: wenxiong @ 2014-05-28 15:56 UTC (permalink / raw) To: davem; +Cc: ariel.elior, netdev Updated with Dmitry's comments. Thanks, Wendy -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH V2 1/2] bnx2x: Adapter not recovery from EEH error injection 2014-05-28 15:56 [PATCH V2 0/2] Adapter not recovery from EEH error injection wenxiong @ 2014-05-28 15:56 ` wenxiong 2014-05-30 12:32 ` Dmitry Kravkov 2014-05-28 15:56 ` [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong 1 sibling, 1 reply; 7+ messages in thread From: wenxiong @ 2014-05-28 15:56 UTC (permalink / raw) To: davem; +Cc: ariel.elior, netdev, Wen Xiong [-- Attachment #1: bnx2x_eeh_fix --] [-- Type: text/plain, Size: 985 bytes --] When injecting EEH error to bnx2x adapter, adapter couldn't be recovery and caused recursive EEH errors. The patch fixes the issue. Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c =================================================================== --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 2014-05-22 18:42:48.000000000 -0500 +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 2014-05-22 18:44:36.757765539 -0500 @@ -13279,8 +13279,8 @@ static int bnx2x_eeh_nic_unload(struct b netdev_reset_tc(bp->dev); del_timer_sync(&bp->timer); - cancel_delayed_work(&bp->sp_task); - cancel_delayed_work(&bp->period_task); + cancel_delayed_work_sync(&bp->sp_task); + cancel_delayed_work_sync(&bp->period_task); spin_lock_bh(&bp->stats_lock); bp->stats_state = STATS_STATE_DISABLED; -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH V2 1/2] bnx2x: Adapter not recovery from EEH error injection 2014-05-28 15:56 ` [PATCH V2 1/2] bnx2x: " wenxiong @ 2014-05-30 12:32 ` Dmitry Kravkov 0 siblings, 0 replies; 7+ messages in thread From: Dmitry Kravkov @ 2014-05-30 12:32 UTC (permalink / raw) To: wenxiong@linux.vnet.ibm.com, David Miller; +Cc: Ariel Elior, netdev > -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev- > owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com > Sent: Wednesday, May 28, 2014 6:57 PM > To: David Miller > Cc: Ariel Elior; netdev; Wen Xiong > Subject: [PATCH V2 1/2] bnx2x: Adapter not recovery from EEH error > injection > > When injecting EEH error to bnx2x adapter, adapter couldn't be recovery > and caused recursive EEH errors. The patch fixes the issue. > > Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c > =================================================================== > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 2014-05-22 > 18:42:48.000000000 -0500 > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 2014-05-22 > 18:44:36.757765539 -0500 > @@ -13279,8 +13279,8 @@ static int bnx2x_eeh_nic_unload(struct b > netdev_reset_tc(bp->dev); > > del_timer_sync(&bp->timer); > - cancel_delayed_work(&bp->sp_task); > - cancel_delayed_work(&bp->period_task); > + cancel_delayed_work_sync(&bp->sp_task); > + cancel_delayed_work_sync(&bp->period_task); > > spin_lock_bh(&bp->stats_lock); > bp->stats_state = STATS_STATE_DISABLED; > Thanks. Reviewed-by: Dmitry Kravkov <dmitry.kravkov@qlogic.com> > -- > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html ________________________________ This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery 2014-05-28 15:56 [PATCH V2 0/2] Adapter not recovery from EEH error injection wenxiong 2014-05-28 15:56 ` [PATCH V2 1/2] bnx2x: " wenxiong @ 2014-05-28 15:56 ` wenxiong 2014-05-28 19:23 ` Dmitry Kravkov 2014-06-02 2:26 ` David Miller 1 sibling, 2 replies; 7+ messages in thread From: wenxiong @ 2014-05-28 15:56 UTC (permalink / raw) To: davem; +Cc: ariel.elior, netdev, Milton Miller, Wen Xiong [-- Attachment #1: bnx2x_rmb_fix --] [-- Type: text/plain, Size: 1787 bytes --] A rmb() is required to ensure that the CQE is not read before it is written by the adapter DMA. PCI ordering rules will make sure the other fields are written before the marker at the end of struct eth_fast_path_rx_cqe but without rmb() a weakly ordered processor can process stale data. Without the barrier we have observed various crashes including bnx2x_tpa_start being called on queues not stopped (resulting in message start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c =================================================================== --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-23 10:34:21.000000000 -0500 +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-28 10:54:26.627766086 -0500 @@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas bd_prod = RX_BD(bd_prod); bd_cons = RX_BD(bd_cons); + /* A rmb() is required to ensure that the CQE is not read + * before it is written by the adapter DMA. PCI ordering + * rules will make sure the other fields are written before + * the marker at the end of struct eth_fast_path_rx_cqe + * but without rmb() a weakly ordered processor can process + * stale data. Without the barrier TPA state-machine might + * enter inconsistent state and kernel stack might be + * provided with incorrect packet description - these lead + * to various kernel crashed. + */ + rmb(); + cqe_fp_flags = cqe_fp->type_error_flags; cqe_fp_type = cqe_fp_flags & ETH_FAST_PATH_RX_CQE_TYPE; -- ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery 2014-05-28 15:56 ` [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong @ 2014-05-28 19:23 ` Dmitry Kravkov 2014-05-29 14:50 ` wenxiong 2014-06-02 2:26 ` David Miller 1 sibling, 1 reply; 7+ messages in thread From: Dmitry Kravkov @ 2014-05-28 19:23 UTC (permalink / raw) To: wenxiong@linux.vnet.ibm.com, David Miller Cc: Ariel Elior, netdev, Milton Miller > -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev- > owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com > Sent: Wednesday, May 28, 2014 6:57 PM > To: David Miller > Cc: Ariel Elior; netdev; Milton Miller; Wen Xiong > Subject: [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after > EEH recovery > > A rmb() is required to ensure that the CQE is not read before it is written by > the adapter DMA. PCI ordering rules will make sure the other fields are > written before the marker at the end of struct eth_fast_path_rx_cqe but > without rmb() a weakly ordered processor can process stale data. > > Without the barrier we have observed various crashes including > bnx2x_tpa_start being called on queues not stopped (resulting in message > start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int. > > Signed-off-by: Milton Miller <miltonm@bga.com> > Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c > =================================================================== > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-23 > 10:34:21.000000000 -0500 > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-28 > 10:54:26.627766086 -0500 > @@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas > bd_prod = RX_BD(bd_prod); > bd_cons = RX_BD(bd_cons); > > + /* A rmb() is required to ensure that the CQE is not read > + * before it is written by the adapter DMA. PCI ordering > + * rules will make sure the other fields are written before > + * the marker at the end of struct eth_fast_path_rx_cqe > + * but without rmb() a weakly ordered processor can process > + * stale data. Without the barrier TPA state-machine might > + * enter inconsistent state and kernel stack might be > + * provided with incorrect packet description - these lead > + * to various kernel crashed. > + */ > + rmb(); > + > cqe_fp_flags = cqe_fp->type_error_flags; > cqe_fp_type = cqe_fp_flags & > ETH_FAST_PATH_RX_CQE_TYPE; > The subject states 'EEH recovery', but looks like this is not the only case. Other than that: Acked-by: Dmitry Kravkov <dmitry.kravkov@qlogic.com> ________________________________ This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery 2014-05-28 19:23 ` Dmitry Kravkov @ 2014-05-29 14:50 ` wenxiong 0 siblings, 0 replies; 7+ messages in thread From: wenxiong @ 2014-05-29 14:50 UTC (permalink / raw) To: Dmitry Kravkov; +Cc: David Miller, Ariel Elior, netdev, Milton Miller Quoting Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>: >> -----Original Message----- >> From: netdev-owner@vger.kernel.org [mailto:netdev- >> owner@vger.kernel.org] On Behalf Of wenxiong@linux.vnet.ibm.com >> Sent: Wednesday, May 28, 2014 6:57 PM >> To: David Miller >> Cc: Ariel Elior; netdev; Milton Miller; Wen Xiong >> Subject: [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after >> EEH recovery >> >> A rmb() is required to ensure that the CQE is not read before it is >> written by >> the adapter DMA. PCI ordering rules will make sure the other fields are >> written before the marker at the end of struct eth_fast_path_rx_cqe but >> without rmb() a weakly ordered processor can process stale data. >> >> Without the barrier we have observed various crashes including >> bnx2x_tpa_start being called on queues not stopped (resulting in message >> start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int. >> >> Signed-off-by: Milton Miller <miltonm@bga.com> >> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >> --- >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c >> =================================================================== >> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-23 >> 10:34:21.000000000 -0500 >> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-28 >> 10:54:26.627766086 -0500 >> @@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas >> bd_prod = RX_BD(bd_prod); >> bd_cons = RX_BD(bd_cons); >> >> + /* A rmb() is required to ensure that the CQE is not read >> + * before it is written by the adapter DMA. PCI ordering >> + * rules will make sure the other fields are written before >> + * the marker at the end of struct eth_fast_path_rx_cqe >> + * but without rmb() a weakly ordered processor can process >> + * stale data. Without the barrier TPA state-machine might >> + * enter inconsistent state and kernel stack might be >> + * provided with incorrect packet description - these lead >> + * to various kernel crashed. >> + */ >> + rmb(); >> + >> cqe_fp_flags = cqe_fp->type_error_flags; >> cqe_fp_type = cqe_fp_flags & >> ETH_FAST_PATH_RX_CQE_TYPE; >> > The subject states 'EEH recovery', but looks like this is not the > only case. Other than that: > > Acked-by: Dmitry Kravkov <dmitry.kravkov@qlogic.com> > > Hi Dmitry, Can you check if you have any comments for 1/2 patch in this series? If you don't, can you ack the 1/2 patch as well? Thanks, Wendy > ________________________________ > > This message and any attached documents contain information from > QLogic Corporation or its wholly-owned subsidiaries that may be > confidential. If you are not the intended recipient, you may not > read, copy, distribute, or use this information. If you have > received this transmission in error, please notify the sender > immediately by reply e-mail and then delete this message. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery 2014-05-28 15:56 ` [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong 2014-05-28 19:23 ` Dmitry Kravkov @ 2014-06-02 2:26 ` David Miller 1 sibling, 0 replies; 7+ messages in thread From: David Miller @ 2014-06-02 2:26 UTC (permalink / raw) To: wenxiong; +Cc: ariel.elior, netdev, miltonm From: wenxiong@linux.vnet.ibm.com Date: Wed, 28 May 2014 10:56:43 -0500 > A rmb() is required to ensure that the CQE is not read before it > is written by the adapter DMA. PCI ordering rules will make sure > the other fields are written before the marker at the end of struct > eth_fast_path_rx_cqe but without rmb() a weakly ordered processor can > process stale data. > > Without the barrier we have observed various crashes including > bnx2x_tpa_start being called on queues not stopped (resulting in message > start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int. > > Signed-off-by: Milton Miller <miltonm@bga.com> > Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > Index: b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c > =================================================================== > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-23 10:34:21.000000000 -0500 > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2014-05-28 10:54:26.627766086 -0500 > @@ -906,6 +906,18 @@ static int bnx2x_rx_int(struct bnx2x_fas > bd_prod = RX_BD(bd_prod); > bd_cons = RX_BD(bd_cons); > > + /* A rmb() is required to ensure that the CQE is not read > + * before it is written by the adapter DMA. PCI ordering > + * rules will make sure the other fields are written before > + * the marker at the end of struct eth_fast_path_rx_cqe > + * but without rmb() a weakly ordered processor can process > + * stale data. Without the barrier TPA state-machine might > + * enter inconsistent state and kernel stack might be > + * provided with incorrect packet description - these lead > + * to various kernel crashed. > + */ Missing a space before the final "*/", this is not formatted correctly. Please fix this and resubmit the whole series. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-06-02 2:26 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-28 15:56 [PATCH V2 0/2] Adapter not recovery from EEH error injection wenxiong 2014-05-28 15:56 ` [PATCH V2 1/2] bnx2x: " wenxiong 2014-05-30 12:32 ` Dmitry Kravkov 2014-05-28 15:56 ` [PATCH V2 2/2] bnx2x: Fix kernel crash and data miscompare after EEH recovery wenxiong 2014-05-28 19:23 ` Dmitry Kravkov 2014-05-29 14:50 ` wenxiong 2014-06-02 2:26 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).