netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] be2net: Ignore spurious UE indication from NIC
@ 2012-09-21 16:36 Ajit Khaparde
  2012-09-21 19:04 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Ajit Khaparde @ 2012-09-21 16:36 UTC (permalink / raw)
  To: davem; +Cc: netdev

Ignore spurious UE indication seen on some platforms.
Consider the error as un-recoverable only when the bits
stay high during second sampling.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h      |    2 ++
 drivers/net/ethernet/emulex/benet/be_main.c |   18 +++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index 5b622993..3d4a7bc 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -401,6 +401,8 @@ struct be_adapter {
 	bool eeh_error;
 	bool fw_timeout;
 	bool hw_error;
+	u32 ue_lo;
+	u32 ue_hi;
 
 	u32 port_num;
 	bool promiscuous;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 84379f4..e970f77 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2105,6 +2105,7 @@ void be_detect_error(struct be_adapter *adapter)
 	u32 ue_lo = 0, ue_hi = 0, ue_lo_mask = 0, ue_hi_mask = 0;
 	u32 sliport_status = 0, sliport_err1 = 0, sliport_err2 = 0;
 	u32 i;
+	struct device *dev = &adapter->pdev->dev;
 
 	if (be_crit_error(adapter))
 		return;
@@ -2129,13 +2130,22 @@ void be_detect_error(struct be_adapter *adapter)
 
 		ue_lo = (ue_lo & ~ue_lo_mask);
 		ue_hi = (ue_hi & ~ue_hi_mask);
+		if (ue_lo != adapter->ue_lo || ue_hi != adapter->ue_hi) {
+			dev_err(dev, "UE read: 0x%x/0x%x\n", ue_lo, ue_hi);
+			goto done;
+		}
+	}
+
+	if (ue_lo == 0xffffffff || ue_hi == 0xffffffff) {
+		adapter->eeh_error = true;
+		dev_err(dev, "PCI slot disconnected\n");
+		goto done;
 	}
 
 	if (ue_lo || ue_hi ||
 		sliport_status & SLIPORT_STATUS_ERR_MASK) {
 		adapter->hw_error = true;
-		dev_err(&adapter->pdev->dev,
-			"Error detected in the card\n");
+		dev_err(dev, "UE detected\n");
 	}
 
 	if (sliport_status & SLIPORT_STATUS_ERR_MASK) {
@@ -2162,7 +2172,9 @@ void be_detect_error(struct be_adapter *adapter)
 				"UE: %s bit set\n", ue_status_hi_desc[i]);
 		}
 	}
-
+done:
+	adapter->ue_lo = ue_lo;
+	adapter->ue_hi = ue_hi;
 }
 
 static void be_msix_disable(struct be_adapter *adapter)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] be2net: Ignore spurious UE indication from NIC
  2012-09-21 16:36 [PATCH net-next] be2net: Ignore spurious UE indication from NIC Ajit Khaparde
@ 2012-09-21 19:04 ` David Miller
  2012-09-26 22:35   ` Khaparde, Ajit
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2012-09-21 19:04 UTC (permalink / raw)
  To: ajit.khaparde; +Cc: netdev

From: Ajit Khaparde <ajit.khaparde@emulex.com>
Date: Fri, 21 Sep 2012 11:36:20 -0500

> Ignore spurious UE indication seen on some platforms.
> Consider the error as un-recoverable only when the bits
> stay high during second sampling.
> 
> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>

Treating uncorrectable errors as spurious seems like an invitation
for hard to track down data corruption to me.

You'll need to come up with a more sophisticated test for
spurious other than "happens more than once" before I'm willing
to subject the entire world to this kind of potential problem.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH net-next] be2net: Ignore spurious UE indication from NIC
  2012-09-21 19:04 ` David Miller
@ 2012-09-26 22:35   ` Khaparde, Ajit
  2012-09-26 23:33     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Khaparde, Ajit @ 2012-09-26 22:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

> From: David Miller [davem@davemloft.net]
> Sent: Friday, September 21, 2012 2:04 PM
> To: Khaparde, Ajit
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] be2net: Ignore spurious UE indication from NIC
>
> From: Ajit Khaparde <ajit.khaparde@emulex.com>
> Date: Fri, 21 Sep 2012 11:36:20 -0500
>
>> Ignore spurious UE indication seen on some platforms.
>> Consider the error as un-recoverable only when the bits
>> stay high during second sampling.
>>
>> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
>
> Treating uncorrectable errors as spurious seems like an invitation
> for hard to track down data corruption to me.
>
> You'll need to come up with a more sophisticated test for
> spurious other than "happens more than once" before I'm willing
> to subject the entire world to this kind of potential problem.

If the UE is real, then the hardware will stop responding to requests.
The hardware block goes offline automatically and no traffic will flow.
After this the hardware will generate a register dump, which can be
retrived using ethtool.

A spurious UE or a data corruption will not generate this dump.

The detection logic is merely to inform the user about the failure
and also avoid any further access to the NIC. This will also prevent
the driver unload from having to timeout on each access to the hardware
which could take a long time to complete.

Please let me know if this is enough to differentiate the scenarios.

Thanks
-Ajit

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] be2net: Ignore spurious UE indication from NIC
  2012-09-26 22:35   ` Khaparde, Ajit
@ 2012-09-26 23:33     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2012-09-26 23:33 UTC (permalink / raw)
  To: Ajit.Khaparde; +Cc: netdev

From: "Khaparde, Ajit" <Ajit.Khaparde@Emulex.Com>
Date: Wed, 26 Sep 2012 22:35:14 +0000

>> From: David Miller [davem@davemloft.net]
>> Sent: Friday, September 21, 2012 2:04 PM
>> To: Khaparde, Ajit
>> Cc: netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next] be2net: Ignore spurious UE indication from NIC
>>
>> From: Ajit Khaparde <ajit.khaparde@emulex.com>
>> Date: Fri, 21 Sep 2012 11:36:20 -0500
>>
>>> Ignore spurious UE indication seen on some platforms.
>>> Consider the error as un-recoverable only when the bits
>>> stay high during second sampling.
>>>
>>> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
>>
>> Treating uncorrectable errors as spurious seems like an invitation
>> for hard to track down data corruption to me.
>>
>> You'll need to come up with a more sophisticated test for
>> spurious other than "happens more than once" before I'm willing
>> to subject the entire world to this kind of potential problem.
> 
> If the UE is real, then the hardware will stop responding to requests.
> The hardware block goes offline automatically and no traffic will flow.
> After this the hardware will generate a register dump, which can be
> retrived using ethtool.
> 
> A spurious UE or a data corruption will not generate this dump.
> 
> The detection logic is merely to inform the user about the failure
> and also avoid any further access to the NIC.

So it sounds more like the UE test should be removed completely
since you cannot rely upon it, and if a real UE happens the hardware
will stop and you will already report the problem when the hardware
block goes offline.

Furthermore, it also implies that you can test if the hardware block
is offline to validate the UE indication.

All of which says irrefutably that your patch is still the wrong way
to do this.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-26 23:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 16:36 [PATCH net-next] be2net: Ignore spurious UE indication from NIC Ajit Khaparde
2012-09-21 19:04 ` David Miller
2012-09-26 22:35   ` Khaparde, Ajit
2012-09-26 23:33     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).