linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [E1000-devel] ARM support for igb driver
       [not found]                 ` <CAHH3p5L6WM3FaZ19Tw9vUcsk+kERfBKnGC2BRtnftJR1pbSF7g@mail.gmail.com>
@ 2014-05-05 15:28                   ` Alexander Duyck
       [not found]                     ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com>
  2014-05-05 20:38                     ` Thomas Petazzoni
  0 siblings, 2 replies; 28+ messages in thread
From: Alexander Duyck @ 2014-05-05 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/04/2014 11:55 PM, shiv prakash Agarwal wrote:
> + linux-arm-kernel mailing list.
> 
> Thanks Alex,
> 
> 1. So overall issue is any memory/config space access hangs(logs above)
> if bus master enable bit is set on IGB NIC card,this is not observed
> with E1000E NIC cards on same platform.
> 
> 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not
> sure how much its related to ARM though.
> 
> 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am
> not sure if this has anything to do with above issue.
> RC config is same for both cases.
> 
> IGB / E1000E
> 
> Command Status: INTx+/INTx-
> PM Status:           NoSoftRst+/NoSoftRst-
> DevCap:                FLReset-/FLReset+
> No Dev/Link2 Cap/Sta Registers for E1000E
> Some differences in AER Registers
> 
> 4. Any idea, if this card is verified on ARM by anybody?
> 

It seems like you are glossing over the obvious issue.  You said it
yourself, this works fine on x86.  Therefore this is likely VERY related
to ARM, or at least your specific ARM platform configuration.

You also mention "some differences in the AER Registers", how about you
tell us what was different there since as I pointed out that could tell
us if there is some error the device detected that is triggering the
problem, or better yet could you just send us the lspci -vvv output from
the problem system.  That would give us much more to work with and help
us to understand what the issue is.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
       [not found]                     ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com>
@ 2014-05-05 20:00                       ` Alexander Duyck
  0 siblings, 0 replies; 28+ messages in thread
From: Alexander Duyck @ 2014-05-05 20:00 UTC (permalink / raw)
  To: linux-arm-kernel

So like I said the AER tells the tale.

Note this bit in y our AER config on the IGB NIC:
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

You see the part that has "UESta: DLP+".  That means that there was a
Data Link protocol error if I am not mistaken.  As a result, as soon as
you turn on the Bus Master Enable the device will issue a message
indicating a Fatal Error to the root complex.  I suspect your root
complex is responding to the Fatal Error by hanging the system.

My advice would be to first find out what is causing the DLP error and
prevent it from happening.  It is likely something related to the PCIe
bus the device is connected to.

Then in the meantime you might be able to also work around the issue by
reading/writing the value from the Uncorrectable Status register back
onto itself to clear the error bit and prevent the message from being
sent.  If nothing else you can probably just write all 0xFF's via setpci
to the register to clear it.  You just need to make sure none of the
UESTa bits are set before you set the BME.

Thanks,

Alex

On 05/05/2014 11:34 AM, shiv prakash Agarwal wrote:
> 1. Below is lspci output for IGB NIC and E1000E NIC
> 2. Although we are seeing this on ARM platform, but we need to root
> cause as to why this occurs?
> 
> a) IGB NIC
> 01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)
>         Subsystem: Intel Corporation Ethernet Server Adapter I210-T1
>         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx+
>         Interrupt: pin A routed to IRQ 130
>         Region 0: Memory at 32100000 (32-bit, non-prefetchable) [size=1M]
>         Region 3: Memory at 32200000 (32-bit, non-prefetchable) [size=16K]
>         [virtual] Expansion ROM at 12100000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00002000
>         Capabilities: [a0] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> Unsupported+
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+
> TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
> Latency L0 <2us, L1 <16us
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
> CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
> SpeedDis-, Selectable De-emphasis: -6dB
>                          Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout+
> NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+
> ChkEn-
>         Capabilities: [140 v1] Device Serial Number a0-36-9f-ff-ff-24-64-ef
>         Capabilities: [1a0 v1] Transaction Processing Hints
>                 Device specific mode supported
>                 Steering table in TPH capability structure
>         Kernel driver in use: igb
> 
> 
> b) E1000E NIC:
> 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
> Connection
>         Subsystem: Intel Corporation Gigabit CT2 Desktop Adapter
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 130
>         Region 0: Memory at 32180000 (32-bit, non-prefetchable) [size=128K]
>         Region 1: Memory at 32100000 (32-bit, non-prefetchable) [size=512K]
>         Region 2: I/O ports at 1000 [disabled] [size=32]
>         Region 3: Memory at 321a0000 (32-bit, non-prefetchable) [size=16K]
>         [virtual] Expansion ROM at 12100000 [disabled] [size=256K]
>         Capabilities: [c8] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> Unsupported+
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+
> TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
> Latency L0 <128ns, L1 <64us
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
> CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00002000
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap-
> ChkEn-
>         Capabilities: [140 v1] Device Serial Number 68-05-ca-ff-ff-12-c3-cb
>         Kernel driver in use: e1000e
> 
> 
> 
> On Mon, May 5, 2014 at 8:58 PM, Alexander Duyck
> <alexander.h.duyck at intel.com <mailto:alexander.h.duyck@intel.com>> wrote:
> 
>     On 05/04/2014 11:55 PM, shiv prakash Agarwal wrote:
>     > + linux-arm-kernel mailing list.
>     >
>     > Thanks Alex,
>     >
>     > 1. So overall issue is any memory/config space access hangs(logs
>     above)
>     > if bus master enable bit is set on IGB NIC card,this is not observed
>     > with E1000E NIC cards on same platform.
>     >
>     > 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not
>     > sure how much its related to ARM though.
>     >
>     > 3. I saw below differences in lspci -vvv output b/w e1000e and
>     igb, I am
>     > not sure if this has anything to do with above issue.
>     > RC config is same for both cases.
>     >
>     > IGB / E1000E
>     >
>     > Command Status: INTx+/INTx-
>     > PM Status:           NoSoftRst+/NoSoftRst-
>     > DevCap:                FLReset-/FLReset+
>     > No Dev/Link2 Cap/Sta Registers for E1000E
>     > Some differences in AER Registers
>     >
>     > 4. Any idea, if this card is verified on ARM by anybody?
>     >
> 
>     It seems like you are glossing over the obvious issue.  You said it
>     yourself, this works fine on x86.  Therefore this is likely VERY related
>     to ARM, or at least your specific ARM platform configuration.
> 
>     You also mention "some differences in the AER Registers", how about you
>     tell us what was different there since as I pointed out that could tell
>     us if there is some error the device detected that is triggering the
>     problem, or better yet could you just send us the lspci -vvv output from
>     the problem system.  That would give us much more to work with and help
>     us to understand what the issue is.
> 
>     Thanks,
> 
>     Alex
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-05 15:28                   ` [E1000-devel] ARM support for igb driver Alexander Duyck
       [not found]                     ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com>
@ 2014-05-05 20:38                     ` Thomas Petazzoni
  2014-05-05 21:20                       ` Alexander Duyck
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-05 20:38 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Alexander Duyck,

On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote:

> > 1. So overall issue is any memory/config space access hangs(logs above)
> > if bus master enable bit is set on IGB NIC card,this is not observed
> > with E1000E NIC cards on same platform.
> > 
> > 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not
> > sure how much its related to ARM though.
> > 
> > 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am
> > not sure if this has anything to do with above issue.
> > RC config is same for both cases.
> > 
> > IGB / E1000E
> > 
> > Command Status: INTx+/INTx-
> > PM Status:           NoSoftRst+/NoSoftRst-
> > DevCap:                FLReset-/FLReset+
> > No Dev/Link2 Cap/Sta Registers for E1000E
> > Some differences in AER Registers
> > 
> > 4. Any idea, if this card is verified on ARM by anybody?
> > 
> 
> It seems like you are glossing over the obvious issue.  You said it
> yourself, this works fine on x86.  Therefore this is likely VERY related
> to ARM, or at least your specific ARM platform configuration.

Since I haven't seen the beginning of the thread, I might be completely
off topic. However, I wanted to mention that I have successfully used
and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is
useful, I'd be happy to provide you with additional details upon
request.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-05 20:38                     ` Thomas Petazzoni
@ 2014-05-05 21:20                       ` Alexander Duyck
       [not found]                         ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>
                                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Alexander Duyck @ 2014-05-05 21:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
> Dear Alexander Duyck,
> 
> On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote:
> 
>>> 1. So overall issue is any memory/config space access hangs(logs above)
>>> if bus master enable bit is set on IGB NIC card,this is not observed
>>> with E1000E NIC cards on same platform.
>>>
>>> 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not
>>> sure how much its related to ARM though.
>>>
>>> 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am
>>> not sure if this has anything to do with above issue.
>>> RC config is same for both cases.
>>>
>>> IGB / E1000E
>>>
>>> Command Status: INTx+/INTx-
>>> PM Status:           NoSoftRst+/NoSoftRst-
>>> DevCap:                FLReset-/FLReset+
>>> No Dev/Link2 Cap/Sta Registers for E1000E
>>> Some differences in AER Registers
>>>
>>> 4. Any idea, if this card is verified on ARM by anybody?
>>>
>>
>> It seems like you are glossing over the obvious issue.  You said it
>> yourself, this works fine on x86.  Therefore this is likely VERY related
>> to ARM, or at least your specific ARM platform configuration.
> 
> Since I haven't seen the beginning of the thread, I might be completely
> off topic. However, I wanted to mention that I have successfully used
> and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is
> useful, I'd be happy to provide you with additional details upon
> request.
> 
> Best regards,
> 
> Thomas
> 


Thomas,

Glad to hear that this is working on your ARM platform as expected.

I believe the issue Shiv is having is due to a problem with the specific
platform as the IGB device is reporting a Data Link Protocol error via
AER and I believe this is what is causing his platform issues.  On
enabling BME the device is likely signalling a Fatal Error message in
response to the DLP error.  The original error he was seeing was:

Unhandled fault: imprecise external abort (0x1406) at 0x00000000

Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
       [not found]                         ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>
@ 2014-05-06 14:58                           ` Alexander Duyck
  2014-06-03 14:57                             ` Ben Dooks
  0 siblings, 1 reply; 28+ messages in thread
From: Alexander Duyck @ 2014-05-06 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

Shiv,

I think we are at the limits of what we can do from the Intel end.
Based on the comments from Thomas it sounds like there shouldn't be any
issues specifically with ARM that prevent the use of IGB PCIe devices,
and the fact is the error your are seeing "Unhandled fault: imprecise
external abort (0x1406) at 0x00000000" can indicate some sort of bus
fault.  It would probably be best to work with someone more familiar
with the inner workings of ARM CPUs as they might be able to work out
some other workarounds for the external abort.

The fact that the Data Link Protocol error was there as well points to
some sort of hardware bus fault.  My advice would be to explore the
reason for why you are getting the Data Link Protocol error as this is
likely the source of the external abort you are seeing.  Beyond that
there isn't much more debugging we can do since this is likely a bus
issue related to your platform configuration.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-05 21:20                       ` Alexander Duyck
       [not found]                         ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>
@ 2014-05-06 15:18                         ` Arnd Bergmann
  2014-05-06 15:24                           ` Lucas Stach
  2014-06-03 14:49                         ` Ben Dooks
  2 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-05-06 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 05 May 2014, Alexander Duyck wrote:
> Glad to hear that this is working on your ARM platform as expected.
> 
> I believe the issue Shiv is having is due to a problem with the specific
> platform as the IGB device is reporting a Data Link Protocol error via
> AER and I believe this is what is causing his platform issues.  On
> enabling BME the device is likely signalling a Fatal Error message in
> response to the DLP error.  The original error he was seeing was:
> 
> Unhandled fault: imprecise external abort (0x1406) at 0x00000000

This isn't too uncommon. There are a couple of traditional PCI host drivers
that register an imprecise external abort handler to catch this and
then look at the host controller registers.

Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches
this error, but it then goes on to ignore it, not even printing
a message about it.

Shiv, which host controller driver are you using?

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-06 15:18                         ` Arnd Bergmann
@ 2014-05-06 15:24                           ` Lucas Stach
  2014-05-06 15:33                             ` Arnd Bergmann
  0 siblings, 1 reply; 28+ messages in thread
From: Lucas Stach @ 2014-05-06 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann:
> On Monday 05 May 2014, Alexander Duyck wrote:
> > Glad to hear that this is working on your ARM platform as expected.
> > 
> > I believe the issue Shiv is having is due to a problem with the specific
> > platform as the IGB device is reporting a Data Link Protocol error via
> > AER and I believe this is what is causing his platform issues.  On
> > enabling BME the device is likely signalling a Fatal Error message in
> > response to the DLP error.  The original error he was seeing was:
> > 
> > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> 
> This isn't too uncommon. There are a couple of traditional PCI host drivers
> that register an imprecise external abort handler to catch this and
> then look at the host controller registers.
> 
> Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches
> this error, but it then goes on to ignore it, not even printing
> a message about it.
> 
I think this handler is mostly there to handle the imprecise external
abort happening on DW pcie IP if the bus scan tries to access an
non-existent device. That's why it silently ignores this error.

BTW: I can confirm that igb i350 works on i.MX6.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-06 15:24                           ` Lucas Stach
@ 2014-05-06 15:33                             ` Arnd Bergmann
  2014-05-30 11:50                               ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-05-06 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 06 May 2014 17:24:33 Lucas Stach wrote:
> Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann:
> > On Monday 05 May 2014, Alexander Duyck wrote:
> > > Glad to hear that this is working on your ARM platform as expected.
> > > 
> > > I believe the issue Shiv is having is due to a problem with the specific
> > > platform as the IGB device is reporting a Data Link Protocol error via
> > > AER and I believe this is what is causing his platform issues.  On
> > > enabling BME the device is likely signalling a Fatal Error message in
> > > response to the DLP error.  The original error he was seeing was:
> > > 
> > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> > 
> > This isn't too uncommon. There are a couple of traditional PCI host drivers
> > that register an imprecise external abort handler to catch this and
> > then look at the host controller registers.
> > 
> > Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches
> > this error, but it then goes on to ignore it, not even printing
> > a message about it.
> > 
> I think this handler is mostly there to handle the imprecise external
> abort happening on DW pcie IP if the bus scan tries to access an
> non-existent device. That's why it silently ignores this error.

That sounds rather dangerous, the driver should probably check for
the particular condition it tries to avoid and print a debug message
in that case, or halt the machine if finds any unknown error, to
prevent propagation of incorrect data.

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-06 15:33                             ` Arnd Bergmann
@ 2014-05-30 11:50                               ` shiv prakash Agarwal
  2014-05-30 17:21                                 ` Alexander Duyck
  0 siblings, 1 reply; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-05-30 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks all,

Finally we see that this hang occurs because some VDM is sent by this I210 card.
Why this card sends VDM? and how can we disable it?

On Tue, May 6, 2014 at 9:03 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 06 May 2014 17:24:33 Lucas Stach wrote:
>> Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann:
>> > On Monday 05 May 2014, Alexander Duyck wrote:
>> > > Glad to hear that this is working on your ARM platform as expected.
>> > >
>> > > I believe the issue Shiv is having is due to a problem with the specific
>> > > platform as the IGB device is reporting a Data Link Protocol error via
>> > > AER and I believe this is what is causing his platform issues.  On
>> > > enabling BME the device is likely signalling a Fatal Error message in
>> > > response to the DLP error.  The original error he was seeing was:
>> > >
>> > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
>> >
>> > This isn't too uncommon. There are a couple of traditional PCI host drivers
>> > that register an imprecise external abort handler to catch this and
>> > then look at the host controller registers.
>> >
>> > Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches
>> > this error, but it then goes on to ignore it, not even printing
>> > a message about it.
>> >
>> I think this handler is mostly there to handle the imprecise external
>> abort happening on DW pcie IP if the bus scan tries to access an
>> non-existent device. That's why it silently ignores this error.
>
> That sounds rather dangerous, the driver should probably check for
> the particular condition it tries to avoid and print a debug message
> in that case, or halt the machine if finds any unknown error, to
> prevent propagation of incorrect data.
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 11:50                               ` shiv prakash Agarwal
@ 2014-05-30 17:21                                 ` Alexander Duyck
  2014-05-30 18:02                                   ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Alexander Duyck @ 2014-05-30 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/30/2014 04:50 AM, shiv prakash Agarwal wrote:
> Thanks all,
> 
> Finally we see that this hang occurs because some VDM is sent by this I210 card.
> Why this card sends VDM? and how can we disable it?


I'm not sure what you mean by VDM?  Are you referring to the AER error
message that is sent by the part?  If so I believe this is being sent
because the I210 is either misconfigured or because the platform is
violating PCIe spec in some way that is triggering the device to send an
error message.  Remember the key bit in all of this is the status of the
device before you load the driver:

>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

That DLP+ indicates an error occurred and that the device will send an
error message as soon as the bus mastering is enabled.

One thing I would recommend trying is clearing the UEsta and UESvrt bits
so that they all read as 0, or - in the lspci dump.  Then you might try
resetting the part via the sysfs reset control and verify that those
bits are still cleared.  However at this point it seems like this
platform you are running the part in has some PCIe issues and that is
beyond the scope of what we can really debug from the driver and OS
stack.  To resolve it you would likely need a PCIe protocol analyzer so
you could see what the DLP error actually was.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 17:21                                 ` Alexander Duyck
@ 2014-05-30 18:02                                   ` shiv prakash Agarwal
  2014-05-30 19:18                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-05-30 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

VDM = Venodor Defined Message

On Fri, May 30, 2014 at 10:51 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> On 05/30/2014 04:50 AM, shiv prakash Agarwal wrote:
>> Thanks all,
>>
>> Finally we see that this hang occurs because some VDM is sent by this I210 card.
>> Why this card sends VDM? and how can we disable it?
>
>
> I'm not sure what you mean by VDM?  Are you referring to the AER error
> message that is sent by the part?  If so I believe this is being sent
> because the I210 is either misconfigured or because the platform is
> violating PCIe spec in some way that is triggering the device to send an
> error message.  Remember the key bit in all of this is the status of the
> device before you load the driver:
>
>>         Capabilities: [100 v2] Advanced Error Reporting
>>                 UESta:  DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr-
>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>
> That DLP+ indicates an error occurred and that the device will send an
> error message as soon as the bus mastering is enabled.
>
> One thing I would recommend trying is clearing the UEsta and UESvrt bits
> so that they all read as 0, or - in the lspci dump.  Then you might try
> resetting the part via the sysfs reset control and verify that those
> bits are still cleared.  However at this point it seems like this
> platform you are running the part in has some PCIe issues and that is
> beyond the scope of what we can really debug from the driver and OS
> stack.  To resolve it you would likely need a PCIe protocol analyzer so
> you could see what the DLP error actually was.
>
> Thanks,
>
> Alex
>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 18:02                                   ` shiv prakash Agarwal
@ 2014-05-30 19:18                                     ` Jason Gunthorpe
  2014-05-30 19:35                                       ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2014-05-30 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 30, 2014 at 11:32:34PM +0530, shiv prakash Agarwal wrote:
> Finally we see that this hang occurs because some VDM is sent by
> this I210 card.  Why this card sends VDM? and how can we disable it?

> VDM = Venodor Defined Message

FWIW, when I last looked at Intel stuff in an analyzer it was sending
regular VDMs for some purpose.

Type 1 VDMs should not cause any errors, the root port should just
silently discard them.

If your root port is returning an error completion or otherwise from a
type 1 VDM then it is broken and the device would be properly
asserting DLP. Some work around would be required for that kind of HW
defect :|

I once tracked down a similar bug with VDM handling in a PCI-E
device..

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 19:18                                     ` Jason Gunthorpe
@ 2014-05-30 19:35                                       ` shiv prakash Agarwal
  2014-05-30 19:56                                         ` Arnd Bergmann
  0 siblings, 1 reply; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-05-30 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks Jason,

Is there a way to disable sending VDM by Intel card?
And what is the purpose of sending type 1 VDM if it has to be
discarded by root port anyway?

On Sat, May 31, 2014 at 12:48 AM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Fri, May 30, 2014 at 11:32:34PM +0530, shiv prakash Agarwal wrote:
>> Finally we see that this hang occurs because some VDM is sent by
>> this I210 card.  Why this card sends VDM? and how can we disable it?
>
>> VDM = Venodor Defined Message
>
> FWIW, when I last looked at Intel stuff in an analyzer it was sending
> regular VDMs for some purpose.
>
> Type 1 VDMs should not cause any errors, the root port should just
> silently discard them.
>
> If your root port is returning an error completion or otherwise from a
> type 1 VDM then it is broken and the device would be properly
> asserting DLP. Some work around would be required for that kind of HW
> defect :|
>
> I once tracked down a similar bug with VDM handling in a PCI-E
> device..
>
> Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 19:35                                       ` shiv prakash Agarwal
@ 2014-05-30 19:56                                         ` Arnd Bergmann
  2014-05-30 20:14                                           ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-05-30 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote:
> Thanks Jason,
> 
> Is there a way to disable sending VDM by Intel card?
> And what is the purpose of sending type 1 VDM if it has to be
> discarded by root port anyway?

I think you should really just disable the behavior at the root port.
Which host driver are you using (sorry if I forgot and you already mentioned
it)?

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 19:56                                         ` Arnd Bergmann
@ 2014-05-30 20:14                                           ` shiv prakash Agarwal
  2014-05-30 21:11                                             ` Fujinaka, Todd
  2014-05-31 18:34                                             ` Arnd Bergmann
  0 siblings, 2 replies; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-05-30 20:14 UTC (permalink / raw)
  To: linux-arm-kernel

Notsure about root port, Can't it be disabled by Intel device?

On Sat, May 31, 2014 at 1:26 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote:
>> Thanks Jason,
>>
>> Is there a way to disable sending VDM by Intel card?
>> And what is the purpose of sending type 1 VDM if it has to be
>> discarded by root port anyway?
>
> I think you should really just disable the behavior at the root port.
> Which host driver are you using (sorry if I forgot and you already mentioned
> it)?
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 20:14                                           ` shiv prakash Agarwal
@ 2014-05-30 21:11                                             ` Fujinaka, Todd
  2014-05-31 18:34                                             ` Arnd Bergmann
  1 sibling, 0 replies; 28+ messages in thread
From: Fujinaka, Todd @ 2014-05-30 21:11 UTC (permalink / raw)
  To: linux-arm-kernel

It just took me one Google search and I found that VDM is being used for MCTP. Searching the datasheet for the i210 (that I'm sure I suggested you consult) discusses MCTP in section 10.7.

I can't see there is a way to turn off MCTP.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujinaka at intel.com
(503) 712-4565

-----Original Message-----
From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com] 
Sent: Friday, May 30, 2014 1:14 PM
To: Arnd Bergmann
Cc: linux-arm-kernel at lists.infradead.org; Jason Gunthorpe; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Fujinaka, Todd; Lucas Stach
Subject: Re: [E1000-devel] ARM support for igb driver

Notsure about root port, Can't it be disabled by Intel device?

On Sat, May 31, 2014 at 1:26 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote:
>> Thanks Jason,
>>
>> Is there a way to disable sending VDM by Intel card?
>> And what is the purpose of sending type 1 VDM if it has to be 
>> discarded by root port anyway?
>
> I think you should really just disable the behavior at the root port.
> Which host driver are you using (sorry if I forgot and you already 
> mentioned it)?
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-30 20:14                                           ` shiv prakash Agarwal
  2014-05-30 21:11                                             ` Fujinaka, Todd
@ 2014-05-31 18:34                                             ` Arnd Bergmann
  2014-06-01  5:56                                               ` shiv prakash Agarwal
  1 sibling, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-05-31 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 31 May 2014 01:44:28 shiv prakash Agarwal wrote:
> Notsure about root port, Can't it be disabled by Intel device?

My point is that it's the wrong place to disable it: every device
is allowed to generate this type of VDMs, and the root port is
supposed to silently ignore them if it doesn't handle them.

If the root port doesn't do that, it's a bug in the host bridge
driver, not in some device driver that happens to operate a
device within the specification.

Which host bridge driver do you use?

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-31 18:34                                             ` Arnd Bergmann
@ 2014-06-01  5:56                                               ` shiv prakash Agarwal
  2014-06-01 11:46                                                 ` Arnd Bergmann
  0 siblings, 1 reply; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-06-01  5:56 UTC (permalink / raw)
  To: linux-arm-kernel

I don't see all devices send VDMs, then why Intel I-210?
Also, Is it a bug in host bridge hardware or driver? If hardware, how
can we make device not to send it?

On Sun, Jun 1, 2014 at 12:04 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 31 May 2014 01:44:28 shiv prakash Agarwal wrote:
>> Notsure about root port, Can't it be disabled by Intel device?
>
> My point is that it's the wrong place to disable it: every device
> is allowed to generate this type of VDMs, and the root port is
> supposed to silently ignore them if it doesn't handle them.
>
> If the root port doesn't do that, it's a bug in the host bridge
> driver, not in some device driver that happens to operate a
> device within the specification.
>
> Which host bridge driver do you use?
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-01  5:56                                               ` shiv prakash Agarwal
@ 2014-06-01 11:46                                                 ` Arnd Bergmann
  2014-06-02  4:53                                                   ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-06-01 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote:
> I don't see all devices send VDMs, then why Intel I-210?

There is no obligation to do it of course.

> Also, Is it a bug in host bridge hardware or driver? If hardware, how
> can we make device not to send it?

If the hardware cannot handle them, it's a hardware bug. If the hardware
does handle them correctly but the software doesn't, that is a bug in
the bridge driver.

We have a couple of host bridge drivers that register a trap handler
and then look at the bridge registers to determine the exact cause.

Which host bridge driver do you use?

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-01 11:46                                                 ` Arnd Bergmann
@ 2014-06-02  4:53                                                   ` shiv prakash Agarwal
  2014-06-02 16:05                                                     ` Fujinaka, Todd
  0 siblings, 1 reply; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-06-02  4:53 UTC (permalink / raw)
  To: linux-arm-kernel

Yes its hardware bug. I need to know whether we can disable it from
device side? If yes, how?

On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote:
>> I don't see all devices send VDMs, then why Intel I-210?
>
> There is no obligation to do it of course.
>
>> Also, Is it a bug in host bridge hardware or driver? If hardware, how
>> can we make device not to send it?
>
> If the hardware cannot handle them, it's a hardware bug. If the hardware
> does handle them correctly but the software doesn't, that is a bug in
> the bridge driver.
>
> We have a couple of host bridge drivers that register a trap handler
> and then look at the bridge registers to determine the exact cause.
>
> Which host bridge driver do you use?
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-02  4:53                                                   ` shiv prakash Agarwal
@ 2014-06-02 16:05                                                     ` Fujinaka, Todd
  2014-06-03 12:33                                                       ` shiv prakash Agarwal
  0 siblings, 1 reply; 28+ messages in thread
From: Fujinaka, Todd @ 2014-06-02 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

There is no hardware bug. The PCIe spec allows VDMs. Note Section 2.2.8.6 where there appear to be a couple of options.

- (Receivers) Completers silently discard Vendor_Defined Type 1 Messages which they are not designed to receive ? this is not an error condition.
- (Receivers) Completers handle the receipt of an unsupported Vendor_Defined Type 0 Message as an Unsupported Request, and the error is reported according to Section 6.2.

I think you may have MCTP enabled and you should be able to disable it in the EEPROM. I will need a lot more information about your system and whether the i210 is a LOM (LAD-on-motherboard, soldered onto your motherboard) or a NIC (what we call a plug-in PCIe card). Either way, you probably won't be able to get it changed without a working OS.

If it's a NIC, you can take it out and put it in a non-ARM Linux system and send me a dump of your current EEPROM.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujinaka at intel.com
(503) 712-4565

-----Original Message-----
From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com] 
Sent: Sunday, June 01, 2014 9:54 PM
To: Arnd Bergmann
Cc: linux-arm-kernel at lists.infradead.org; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Jason Gunthorpe; Fujinaka, Todd; Lucas Stach
Subject: Re: [E1000-devel] ARM support for igb driver

Yes its hardware bug. I need to know whether we can disable it from device side? If yes, how?

On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote:
>> I don't see all devices send VDMs, then why Intel I-210?
>
> There is no obligation to do it of course.
>
>> Also, Is it a bug in host bridge hardware or driver? If hardware, how 
>> can we make device not to send it?
>
> If the hardware cannot handle them, it's a hardware bug. If the 
> hardware does handle them correctly but the software doesn't, that is 
> a bug in the bridge driver.
>
> We have a couple of host bridge drivers that register a trap handler 
> and then look at the bridge registers to determine the exact cause.
>
> Which host bridge driver do you use?
>
>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-02 16:05                                                     ` Fujinaka, Todd
@ 2014-06-03 12:33                                                       ` shiv prakash Agarwal
  0 siblings, 0 replies; 28+ messages in thread
From: shiv prakash Agarwal @ 2014-06-03 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks,

Yes it is a NIC. How to get dump of its EEPROM?

On Mon, Jun 2, 2014 at 9:35 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
> There is no hardware bug. The PCIe spec allows VDMs. Note Section 2.2.8.6 where there appear to be a couple of options.
>
> - (Receivers) Completers silently discard Vendor_Defined Type 1 Messages which they are not designed to receive ? this is not an error condition.
> - (Receivers) Completers handle the receipt of an unsupported Vendor_Defined Type 0 Message as an Unsupported Request, and the error is reported according to Section 6.2.
>
> I think you may have MCTP enabled and you should be able to disable it in the EEPROM. I will need a lot more information about your system and whether the i210 is a LOM (LAD-on-motherboard, soldered onto your motherboard) or a NIC (what we call a plug-in PCIe card). Either way, you probably won't be able to get it changed without a working OS.
>
> If it's a NIC, you can take it out and put it in a non-ARM Linux system and send me a dump of your current EEPROM.
>
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujinaka at intel.com
> (503) 712-4565
>
> -----Original Message-----
> From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com]
> Sent: Sunday, June 01, 2014 9:54 PM
> To: Arnd Bergmann
> Cc: linux-arm-kernel at lists.infradead.org; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Jason Gunthorpe; Fujinaka, Todd; Lucas Stach
> Subject: Re: [E1000-devel] ARM support for igb driver
>
> Yes its hardware bug. I need to know whether we can disable it from device side? If yes, how?
>
> On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote:
>>> I don't see all devices send VDMs, then why Intel I-210?
>>
>> There is no obligation to do it of course.
>>
>>> Also, Is it a bug in host bridge hardware or driver? If hardware, how
>>> can we make device not to send it?
>>
>> If the hardware cannot handle them, it's a hardware bug. If the
>> hardware does handle them correctly but the software doesn't, that is
>> a bug in the bridge driver.
>>
>> We have a couple of host bridge drivers that register a trap handler
>> and then look at the bridge registers to determine the exact cause.
>>
>> Which host bridge driver do you use?
>>
>>         Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-05 21:20                       ` Alexander Duyck
       [not found]                         ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>
  2014-05-06 15:18                         ` Arnd Bergmann
@ 2014-06-03 14:49                         ` Ben Dooks
  2014-06-03 15:05                           ` Arnd Bergmann
  2 siblings, 1 reply; 28+ messages in thread
From: Ben Dooks @ 2014-06-03 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/05/14 22:20, Alexander Duyck wrote:
> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
>> Dear Alexander Duyck,
>>
>> On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote:
>>
>>>> 1. So overall issue is any memory/config space access hangs(logs above)
>>>> if bus master enable bit is set on IGB NIC card,this is not observed
>>>> with E1000E NIC cards on same platform.
>>>>
>>>> 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not
>>>> sure how much its related to ARM though.
>>>>
>>>> 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am
>>>> not sure if this has anything to do with above issue.
>>>> RC config is same for both cases.
>>>>
>>>> IGB / E1000E
>>>>
>>>> Command Status: INTx+/INTx-
>>>> PM Status:           NoSoftRst+/NoSoftRst-
>>>> DevCap:                FLReset-/FLReset+
>>>> No Dev/Link2 Cap/Sta Registers for E1000E
>>>> Some differences in AER Registers
>>>>
>>>> 4. Any idea, if this card is verified on ARM by anybody?
>>>>
>>>
>>> It seems like you are glossing over the obvious issue.  You said it
>>> yourself, this works fine on x86.  Therefore this is likely VERY related
>>> to ARM, or at least your specific ARM platform configuration.
>>
>> Since I haven't seen the beginning of the thread, I might be completely
>> off topic. However, I wanted to mention that I have successfully used
>> and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is
>> useful, I'd be happy to provide you with additional details upon
>> request.
>>
>> Best regards,
>>
>> Thomas
>>
> 
> 
> Thomas,
> 
> Glad to hear that this is working on your ARM platform as expected.
> 
> I believe the issue Shiv is having is due to a problem with the specific
> platform as the IGB device is reporting a Data Link Protocol error via
> AER and I believe this is what is causing his platform issues.  On
> enabling BME the device is likely signalling a Fatal Error message in
> response to the DLP error.  The original error he was seeing was:
> 
> Unhandled fault: imprecise external abort (0x1406) at 0x00000000

I should sort out making these errors non-fatal to the system, there's
not really much point in killing a process that may not have been the
initiator of the problem.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-05-06 14:58                           ` Alexander Duyck
@ 2014-06-03 14:57                             ` Ben Dooks
  0 siblings, 0 replies; 28+ messages in thread
From: Ben Dooks @ 2014-06-03 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/05/14 15:58, Alexander Duyck wrote:
> Shiv,
> 
> I think we are at the limits of what we can do from the Intel end.
> Based on the comments from Thomas it sounds like there shouldn't be any
> issues specifically with ARM that prevent the use of IGB PCIe devices,
> and the fact is the error your are seeing "Unhandled fault: imprecise
> external abort (0x1406) at 0x00000000" can indicate some sort of bus
> fault.  It would probably be best to work with someone more familiar
> with the inner workings of ARM CPUs as they might be able to work out
> some other workarounds for the external abort.

I did send a pair of patches for this, as imprecise aborts are not
traceable to their source instruction and there is a case that it
should not abort any tasks.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-03 14:49                         ` Ben Dooks
@ 2014-06-03 15:05                           ` Arnd Bergmann
  2014-06-03 15:13                             ` Ben Dooks
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-06-03 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote:
> On 05/05/14 22:20, Alexander Duyck wrote:
> > On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
> > 
> > Glad to hear that this is working on your ARM platform as expected.
> > 
> > I believe the issue Shiv is having is due to a problem with the specific
> > platform as the IGB device is reporting a Data Link Protocol error via
> > AER and I believe this is what is causing his platform issues.  On
> > enabling BME the device is likely signalling a Fatal Error message in
> > response to the DLP error.  The original error he was seeing was:
> > 
> > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> 
> I should sort out making these errors non-fatal to the system, there's
> not really much point in killing a process that may not have been the
> initiator of the problem.

We really shouldn't catch those errors system-wide, it belongs into
the specific host bridge driver, but Shiv refuses to say which one that
is, so we can't fix it.

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-03 15:05                           ` Arnd Bergmann
@ 2014-06-03 15:13                             ` Ben Dooks
  2014-06-03 15:23                               ` Arnd Bergmann
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Dooks @ 2014-06-03 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/06/14 16:05, Arnd Bergmann wrote:
> On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote:
>> On 05/05/14 22:20, Alexander Duyck wrote:
>>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
>>>
>>> Glad to hear that this is working on your ARM platform as expected.
>>>
>>> I believe the issue Shiv is having is due to a problem with the specific
>>> platform as the IGB device is reporting a Data Link Protocol error via
>>> AER and I believe this is what is causing his platform issues.  On
>>> enabling BME the device is likely signalling a Fatal Error message in
>>> response to the DLP error.  The original error he was seeing was:
>>>
>>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000
>>
>> I should sort out making these errors non-fatal to the system, there's
>> not really much point in killing a process that may not have been the
>> initiator of the problem.
> 
> We really shouldn't catch those errors system-wide, it belongs into
> the specific host bridge driver, but Shiv refuses to say which one that
> is, so we can't fix it.

I am not sure what else we can do, either we either have to have a
default null handler, or log that they have happened.

The whole issue with the "imprecise external" part is that you have
no idea what instruction (or core) caused the issue and IIRC there is
very little information about what actually sent the abort.

I believe these are useful to report as they tend to show that some
part of the system has gone wrong. For example, we get them on the
rcar-h2 if the system tries to access a unit that has not been
properly clocked.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-03 15:13                             ` Ben Dooks
@ 2014-06-03 15:23                               ` Arnd Bergmann
  2014-06-03 15:31                                 ` Ben Dooks
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-06-03 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 03 June 2014 16:13:07 Ben Dooks wrote:
> On 03/06/14 16:05, Arnd Bergmann wrote:
> > On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote:
> >> On 05/05/14 22:20, Alexander Duyck wrote:
> >>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
> >>>
> >>> Glad to hear that this is working on your ARM platform as expected.
> >>>
> >>> I believe the issue Shiv is having is due to a problem with the specific
> >>> platform as the IGB device is reporting a Data Link Protocol error via
> >>> AER and I believe this is what is causing his platform issues.  On
> >>> enabling BME the device is likely signalling a Fatal Error message in
> >>> response to the DLP error.  The original error he was seeing was:
> >>>
> >>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> >>
> >> I should sort out making these errors non-fatal to the system, there's
> >> not really much point in killing a process that may not have been the
> >> initiator of the problem.
> > 
> > We really shouldn't catch those errors system-wide, it belongs into
> > the specific host bridge driver, but Shiv refuses to say which one that
> > is, so we can't fix it.
> 
> I am not sure what else we can do, either we either have to have a
> default null handler, or log that they have happened.
> 
> The whole issue with the "imprecise external" part is that you have
> no idea what instruction (or core) caused the issue and IIRC there is
> very little information about what actually sent the abort.
> 
> I believe these are useful to report as they tend to show that some
> part of the system has gone wrong. For example, we get them on the
> rcar-h2 if the system tries to access a unit that has not been
> properly clocked.

In my experience, any unit that can send such an abort also has
a diagnotic register that you can look into to find out at least the
unit that triggered it so you can disable it.

If none of the known sources caused the abort, it's generally best
to shut down the system to avoid further data corruption. 

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [E1000-devel] ARM support for igb driver
  2014-06-03 15:23                               ` Arnd Bergmann
@ 2014-06-03 15:31                                 ` Ben Dooks
  0 siblings, 0 replies; 28+ messages in thread
From: Ben Dooks @ 2014-06-03 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/06/14 16:23, Arnd Bergmann wrote:
> On Tuesday 03 June 2014 16:13:07 Ben Dooks wrote:
>> On 03/06/14 16:05, Arnd Bergmann wrote:
>>> On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote:
>>>> On 05/05/14 22:20, Alexander Duyck wrote:
>>>>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote:
>>>>>
>>>>> Glad to hear that this is working on your ARM platform as expected.
>>>>>
>>>>> I believe the issue Shiv is having is due to a problem with the specific
>>>>> platform as the IGB device is reporting a Data Link Protocol error via
>>>>> AER and I believe this is what is causing his platform issues.  On
>>>>> enabling BME the device is likely signalling a Fatal Error message in
>>>>> response to the DLP error.  The original error he was seeing was:
>>>>>
>>>>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000
>>>>
>>>> I should sort out making these errors non-fatal to the system, there's
>>>> not really much point in killing a process that may not have been the
>>>> initiator of the problem.
>>>
>>> We really shouldn't catch those errors system-wide, it belongs into
>>> the specific host bridge driver, but Shiv refuses to say which one that
>>> is, so we can't fix it.
>>
>> I am not sure what else we can do, either we either have to have a
>> default null handler, or log that they have happened.
>>
>> The whole issue with the "imprecise external" part is that you have
>> no idea what instruction (or core) caused the issue and IIRC there is
>> very little information about what actually sent the abort.
>>
>> I believe these are useful to report as they tend to show that some
>> part of the system has gone wrong. For example, we get them on the
>> rcar-h2 if the system tries to access a unit that has not been
>> properly clocked.
> 
> In my experience, any unit that can send such an abort also has
> a diagnotic register that you can look into to find out at least the
> unit that triggered it so you can disable it.

Not on the rcar-h2.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-06-03 15:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHH3p5KAwyMOY=ai_h0eM+JwYLtTO8d4jVaVOWVtOakhqPRTew@mail.gmail.com>
     [not found] ` <CF7EA37A.4676A%matthew.vick@intel.com>
     [not found]   ` <CAHH3p5JFc3jgeK7OFd1rtgDS3gjRdjiPr3LwwqfyKMPGSTwMWA@mail.gmail.com>
     [not found]     ` <CF7FD060.46A87%matthew.vick@intel.com>
     [not found]       ` <CAHH3p5J8V6uWEzbCoT+x-6_2x1q=nc=_oMn0qnnj113FmeuW6A@mail.gmail.com>
     [not found]         ` <9B4A1B1917080E46B64F07F2989DADD6533627A5@ORSMSX114.amr.corp.intel.com>
     [not found]           ` <CAHH3p5KEPXw5mxP-13evd774nyO5C13ky0=fCaZPCNVx1gh3Ow@mail.gmail.com>
     [not found]             ` <CAHH3p5LaRwgdECxsO4OZSdWEhGt232yj-3347kYEStP6NLg2xg@mail.gmail.com>
     [not found]               ` <5360041A.7030005@intel.com>
     [not found]                 ` <CAHH3p5L6WM3FaZ19Tw9vUcsk+kERfBKnGC2BRtnftJR1pbSF7g@mail.gmail.com>
2014-05-05 15:28                   ` [E1000-devel] ARM support for igb driver Alexander Duyck
     [not found]                     ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com>
2014-05-05 20:00                       ` Alexander Duyck
2014-05-05 20:38                     ` Thomas Petazzoni
2014-05-05 21:20                       ` Alexander Duyck
     [not found]                         ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>
2014-05-06 14:58                           ` Alexander Duyck
2014-06-03 14:57                             ` Ben Dooks
2014-05-06 15:18                         ` Arnd Bergmann
2014-05-06 15:24                           ` Lucas Stach
2014-05-06 15:33                             ` Arnd Bergmann
2014-05-30 11:50                               ` shiv prakash Agarwal
2014-05-30 17:21                                 ` Alexander Duyck
2014-05-30 18:02                                   ` shiv prakash Agarwal
2014-05-30 19:18                                     ` Jason Gunthorpe
2014-05-30 19:35                                       ` shiv prakash Agarwal
2014-05-30 19:56                                         ` Arnd Bergmann
2014-05-30 20:14                                           ` shiv prakash Agarwal
2014-05-30 21:11                                             ` Fujinaka, Todd
2014-05-31 18:34                                             ` Arnd Bergmann
2014-06-01  5:56                                               ` shiv prakash Agarwal
2014-06-01 11:46                                                 ` Arnd Bergmann
2014-06-02  4:53                                                   ` shiv prakash Agarwal
2014-06-02 16:05                                                     ` Fujinaka, Todd
2014-06-03 12:33                                                       ` shiv prakash Agarwal
2014-06-03 14:49                         ` Ben Dooks
2014-06-03 15:05                           ` Arnd Bergmann
2014-06-03 15:13                             ` Ben Dooks
2014-06-03 15:23                               ` Arnd Bergmann
2014-06-03 15:31                                 ` Ben Dooks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).