* [E1000-devel] ARM support for igb driver [not found] ` <CAHH3p5L6WM3FaZ19Tw9vUcsk+kERfBKnGC2BRtnftJR1pbSF7g@mail.gmail.com> @ 2014-05-05 15:28 ` Alexander Duyck [not found] ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com> 2014-05-05 20:38 ` Thomas Petazzoni 0 siblings, 2 replies; 28+ messages in thread From: Alexander Duyck @ 2014-05-05 15:28 UTC (permalink / raw) To: linux-arm-kernel On 05/04/2014 11:55 PM, shiv prakash Agarwal wrote: > + linux-arm-kernel mailing list. > > Thanks Alex, > > 1. So overall issue is any memory/config space access hangs(logs above) > if bus master enable bit is set on IGB NIC card,this is not observed > with E1000E NIC cards on same platform. > > 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not > sure how much its related to ARM though. > > 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am > not sure if this has anything to do with above issue. > RC config is same for both cases. > > IGB / E1000E > > Command Status: INTx+/INTx- > PM Status: NoSoftRst+/NoSoftRst- > DevCap: FLReset-/FLReset+ > No Dev/Link2 Cap/Sta Registers for E1000E > Some differences in AER Registers > > 4. Any idea, if this card is verified on ARM by anybody? > It seems like you are glossing over the obvious issue. You said it yourself, this works fine on x86. Therefore this is likely VERY related to ARM, or at least your specific ARM platform configuration. You also mention "some differences in the AER Registers", how about you tell us what was different there since as I pointed out that could tell us if there is some error the device detected that is triggering the problem, or better yet could you just send us the lspci -vvv output from the problem system. That would give us much more to work with and help us to understand what the issue is. Thanks, Alex ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com>]
* [E1000-devel] ARM support for igb driver [not found] ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com> @ 2014-05-05 20:00 ` Alexander Duyck 0 siblings, 0 replies; 28+ messages in thread From: Alexander Duyck @ 2014-05-05 20:00 UTC (permalink / raw) To: linux-arm-kernel So like I said the AER tells the tale. Note this bit in y our AER config on the IGB NIC: > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- You see the part that has "UESta: DLP+". That means that there was a Data Link protocol error if I am not mistaken. As a result, as soon as you turn on the Bus Master Enable the device will issue a message indicating a Fatal Error to the root complex. I suspect your root complex is responding to the Fatal Error by hanging the system. My advice would be to first find out what is causing the DLP error and prevent it from happening. It is likely something related to the PCIe bus the device is connected to. Then in the meantime you might be able to also work around the issue by reading/writing the value from the Uncorrectable Status register back onto itself to clear the error bit and prevent the message from being sent. If nothing else you can probably just write all 0xFF's via setpci to the register to clear it. You just need to make sure none of the UESTa bits are set before you set the BME. Thanks, Alex On 05/05/2014 11:34 AM, shiv prakash Agarwal wrote: > 1. Below is lspci output for IGB NIC and E1000E NIC > 2. Although we are seeing this on ARM platform, but we need to root > cause as to why this occurs? > > a) IGB NIC > 01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > Subsystem: Intel Corporation Ethernet Server Adapter I210-T1 > Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- > ParErr+ Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx+ > Interrupt: pin A routed to IRQ 130 > Region 0: Memory at 32100000 (32-bit, non-prefetchable) [size=1M] > Region 3: Memory at 32200000 (32-bit, non-prefetchable) [size=16K] > [virtual] Expansion ROM at 12100000 [disabled] [size=1M] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable+ Count=5 Masked- > Vector table: BAR=3 offset=00000000 > PBA: BAR=3 offset=00002000 > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ > TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, > Latency L0 <2us, L1 <16us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- > CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- > SpeedDis-, Selectable De-emphasis: -6dB > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, > LinkEqualizationRequest- > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ > NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ > ChkEn- > Capabilities: [140 v1] Device Serial Number a0-36-9f-ff-ff-24-64-ef > Capabilities: [1a0 v1] Transaction Processing Hints > Device specific mode supported > Steering table in TPH capability structure > Kernel driver in use: igb > > > b) E1000E NIC: > 01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > Subsystem: Intel Corporation Gigabit CT2 Desktop Adapter > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr+ Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 130 > Region 0: Memory at 32180000 (32-bit, non-prefetchable) [size=128K] > Region 1: Memory at 32100000 (32-bit, non-prefetchable) [size=512K] > Region 2: I/O ports at 1000 [disabled] [size=32] > Region 3: Memory at 321a0000 (32-bit, non-prefetchable) [size=16K] > [virtual] Expansion ROM at 12100000 [disabled] [size=256K] > Capabilities: [c8] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ > Address: 0000000000000000 Data: 0000 > Capabilities: [e0] Express (v1) Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ > TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, > Latency L0 <128ns, L1 <64us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- > CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- > Vector table: BAR=3 offset=00000000 > PBA: BAR=3 offset=00002000 > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- > ChkEn- > Capabilities: [140 v1] Device Serial Number 68-05-ca-ff-ff-12-c3-cb > Kernel driver in use: e1000e > > > > On Mon, May 5, 2014 at 8:58 PM, Alexander Duyck > <alexander.h.duyck at intel.com <mailto:alexander.h.duyck@intel.com>> wrote: > > On 05/04/2014 11:55 PM, shiv prakash Agarwal wrote: > > + linux-arm-kernel mailing list. > > > > Thanks Alex, > > > > 1. So overall issue is any memory/config space access hangs(logs > above) > > if bus master enable bit is set on IGB NIC card,this is not observed > > with E1000E NIC cards on same platform. > > > > 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not > > sure how much its related to ARM though. > > > > 3. I saw below differences in lspci -vvv output b/w e1000e and > igb, I am > > not sure if this has anything to do with above issue. > > RC config is same for both cases. > > > > IGB / E1000E > > > > Command Status: INTx+/INTx- > > PM Status: NoSoftRst+/NoSoftRst- > > DevCap: FLReset-/FLReset+ > > No Dev/Link2 Cap/Sta Registers for E1000E > > Some differences in AER Registers > > > > 4. Any idea, if this card is verified on ARM by anybody? > > > > It seems like you are glossing over the obvious issue. You said it > yourself, this works fine on x86. Therefore this is likely VERY related > to ARM, or at least your specific ARM platform configuration. > > You also mention "some differences in the AER Registers", how about you > tell us what was different there since as I pointed out that could tell > us if there is some error the device detected that is triggering the > problem, or better yet could you just send us the lspci -vvv output from > the problem system. That would give us much more to work with and help > us to understand what the issue is. > > Thanks, > > Alex > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-05 15:28 ` [E1000-devel] ARM support for igb driver Alexander Duyck [not found] ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com> @ 2014-05-05 20:38 ` Thomas Petazzoni 2014-05-05 21:20 ` Alexander Duyck 1 sibling, 1 reply; 28+ messages in thread From: Thomas Petazzoni @ 2014-05-05 20:38 UTC (permalink / raw) To: linux-arm-kernel Dear Alexander Duyck, On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote: > > 1. So overall issue is any memory/config space access hangs(logs above) > > if bus master enable bit is set on IGB NIC card,this is not observed > > with E1000E NIC cards on same platform. > > > > 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not > > sure how much its related to ARM though. > > > > 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am > > not sure if this has anything to do with above issue. > > RC config is same for both cases. > > > > IGB / E1000E > > > > Command Status: INTx+/INTx- > > PM Status: NoSoftRst+/NoSoftRst- > > DevCap: FLReset-/FLReset+ > > No Dev/Link2 Cap/Sta Registers for E1000E > > Some differences in AER Registers > > > > 4. Any idea, if this card is verified on ARM by anybody? > > > > It seems like you are glossing over the obvious issue. You said it > yourself, this works fine on x86. Therefore this is likely VERY related > to ARM, or at least your specific ARM platform configuration. Since I haven't seen the beginning of the thread, I might be completely off topic. However, I wanted to mention that I have successfully used and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is useful, I'd be happy to provide you with additional details upon request. Best regards, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-05 20:38 ` Thomas Petazzoni @ 2014-05-05 21:20 ` Alexander Duyck [not found] ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com> ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Alexander Duyck @ 2014-05-05 21:20 UTC (permalink / raw) To: linux-arm-kernel On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: > Dear Alexander Duyck, > > On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote: > >>> 1. So overall issue is any memory/config space access hangs(logs above) >>> if bus master enable bit is set on IGB NIC card,this is not observed >>> with E1000E NIC cards on same platform. >>> >>> 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not >>> sure how much its related to ARM though. >>> >>> 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am >>> not sure if this has anything to do with above issue. >>> RC config is same for both cases. >>> >>> IGB / E1000E >>> >>> Command Status: INTx+/INTx- >>> PM Status: NoSoftRst+/NoSoftRst- >>> DevCap: FLReset-/FLReset+ >>> No Dev/Link2 Cap/Sta Registers for E1000E >>> Some differences in AER Registers >>> >>> 4. Any idea, if this card is verified on ARM by anybody? >>> >> >> It seems like you are glossing over the obvious issue. You said it >> yourself, this works fine on x86. Therefore this is likely VERY related >> to ARM, or at least your specific ARM platform configuration. > > Since I haven't seen the beginning of the thread, I might be completely > off topic. However, I wanted to mention that I have successfully used > and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is > useful, I'd be happy to provide you with additional details upon > request. > > Best regards, > > Thomas > Thomas, Glad to hear that this is working on your ARM platform as expected. I believe the issue Shiv is having is due to a problem with the specific platform as the IGB device is reporting a Data Link Protocol error via AER and I believe this is what is causing his platform issues. On enabling BME the device is likely signalling a Fatal Error message in response to the DLP error. The original error he was seeing was: Unhandled fault: imprecise external abort (0x1406) at 0x00000000 Thanks, Alex ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com>]
* [E1000-devel] ARM support for igb driver [not found] ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com> @ 2014-05-06 14:58 ` Alexander Duyck 2014-06-03 14:57 ` Ben Dooks 0 siblings, 1 reply; 28+ messages in thread From: Alexander Duyck @ 2014-05-06 14:58 UTC (permalink / raw) To: linux-arm-kernel Shiv, I think we are at the limits of what we can do from the Intel end. Based on the comments from Thomas it sounds like there shouldn't be any issues specifically with ARM that prevent the use of IGB PCIe devices, and the fact is the error your are seeing "Unhandled fault: imprecise external abort (0x1406) at 0x00000000" can indicate some sort of bus fault. It would probably be best to work with someone more familiar with the inner workings of ARM CPUs as they might be able to work out some other workarounds for the external abort. The fact that the Data Link Protocol error was there as well points to some sort of hardware bus fault. My advice would be to explore the reason for why you are getting the Data Link Protocol error as this is likely the source of the external abort you are seeing. Beyond that there isn't much more debugging we can do since this is likely a bus issue related to your platform configuration. Thanks, Alex ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-06 14:58 ` Alexander Duyck @ 2014-06-03 14:57 ` Ben Dooks 0 siblings, 0 replies; 28+ messages in thread From: Ben Dooks @ 2014-06-03 14:57 UTC (permalink / raw) To: linux-arm-kernel On 06/05/14 15:58, Alexander Duyck wrote: > Shiv, > > I think we are at the limits of what we can do from the Intel end. > Based on the comments from Thomas it sounds like there shouldn't be any > issues specifically with ARM that prevent the use of IGB PCIe devices, > and the fact is the error your are seeing "Unhandled fault: imprecise > external abort (0x1406) at 0x00000000" can indicate some sort of bus > fault. It would probably be best to work with someone more familiar > with the inner workings of ARM CPUs as they might be able to work out > some other workarounds for the external abort. I did send a pair of patches for this, as imprecise aborts are not traceable to their source instruction and there is a case that it should not abort any tasks. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-05 21:20 ` Alexander Duyck [not found] ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com> @ 2014-05-06 15:18 ` Arnd Bergmann 2014-05-06 15:24 ` Lucas Stach 2014-06-03 14:49 ` Ben Dooks 2 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-05-06 15:18 UTC (permalink / raw) To: linux-arm-kernel On Monday 05 May 2014, Alexander Duyck wrote: > Glad to hear that this is working on your ARM platform as expected. > > I believe the issue Shiv is having is due to a problem with the specific > platform as the IGB device is reporting a Data Link Protocol error via > AER and I believe this is what is causing his platform issues. On > enabling BME the device is likely signalling a Fatal Error message in > response to the DLP error. The original error he was seeing was: > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 This isn't too uncommon. There are a couple of traditional PCI host drivers that register an imprecise external abort handler to catch this and then look at the host controller registers. Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches this error, but it then goes on to ignore it, not even printing a message about it. Shiv, which host controller driver are you using? Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-06 15:18 ` Arnd Bergmann @ 2014-05-06 15:24 ` Lucas Stach 2014-05-06 15:33 ` Arnd Bergmann 0 siblings, 1 reply; 28+ messages in thread From: Lucas Stach @ 2014-05-06 15:24 UTC (permalink / raw) To: linux-arm-kernel Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann: > On Monday 05 May 2014, Alexander Duyck wrote: > > Glad to hear that this is working on your ARM platform as expected. > > > > I believe the issue Shiv is having is due to a problem with the specific > > platform as the IGB device is reporting a Data Link Protocol error via > > AER and I believe this is what is causing his platform issues. On > > enabling BME the device is likely signalling a Fatal Error message in > > response to the DLP error. The original error he was seeing was: > > > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 > > This isn't too uncommon. There are a couple of traditional PCI host drivers > that register an imprecise external abort handler to catch this and > then look at the host controller registers. > > Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches > this error, but it then goes on to ignore it, not even printing > a message about it. > I think this handler is mostly there to handle the imprecise external abort happening on DW pcie IP if the bus scan tries to access an non-existent device. That's why it silently ignores this error. BTW: I can confirm that igb i350 works on i.MX6. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-06 15:24 ` Lucas Stach @ 2014-05-06 15:33 ` Arnd Bergmann 2014-05-30 11:50 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-05-06 15:33 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 06 May 2014 17:24:33 Lucas Stach wrote: > Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann: > > On Monday 05 May 2014, Alexander Duyck wrote: > > > Glad to hear that this is working on your ARM platform as expected. > > > > > > I believe the issue Shiv is having is due to a problem with the specific > > > platform as the IGB device is reporting a Data Link Protocol error via > > > AER and I believe this is what is causing his platform issues. On > > > enabling BME the device is likely signalling a Fatal Error message in > > > response to the DLP error. The original error he was seeing was: > > > > > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 > > > > This isn't too uncommon. There are a couple of traditional PCI host drivers > > that register an imprecise external abort handler to catch this and > > then look at the host controller registers. > > > > Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches > > this error, but it then goes on to ignore it, not even printing > > a message about it. > > > I think this handler is mostly there to handle the imprecise external > abort happening on DW pcie IP if the bus scan tries to access an > non-existent device. That's why it silently ignores this error. That sounds rather dangerous, the driver should probably check for the particular condition it tries to avoid and print a debug message in that case, or halt the machine if finds any unknown error, to prevent propagation of incorrect data. Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-06 15:33 ` Arnd Bergmann @ 2014-05-30 11:50 ` shiv prakash Agarwal 2014-05-30 17:21 ` Alexander Duyck 0 siblings, 1 reply; 28+ messages in thread From: shiv prakash Agarwal @ 2014-05-30 11:50 UTC (permalink / raw) To: linux-arm-kernel Thanks all, Finally we see that this hang occurs because some VDM is sent by this I210 card. Why this card sends VDM? and how can we disable it? On Tue, May 6, 2014 at 9:03 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 06 May 2014 17:24:33 Lucas Stach wrote: >> Am Dienstag, den 06.05.2014, 17:18 +0200 schrieb Arnd Bergmann: >> > On Monday 05 May 2014, Alexander Duyck wrote: >> > > Glad to hear that this is working on your ARM platform as expected. >> > > >> > > I believe the issue Shiv is having is due to a problem with the specific >> > > platform as the IGB device is reporting a Data Link Protocol error via >> > > AER and I believe this is what is causing his platform issues. On >> > > enabling BME the device is likely signalling a Fatal Error message in >> > > response to the DLP error. The original error he was seeing was: >> > > >> > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 >> > >> > This isn't too uncommon. There are a couple of traditional PCI host drivers >> > that register an imprecise external abort handler to catch this and >> > then look at the host controller registers. >> > >> > Out of the pcie hosts, only drivers/pci/host/pci-imx6.c catches >> > this error, but it then goes on to ignore it, not even printing >> > a message about it. >> > >> I think this handler is mostly there to handle the imprecise external >> abort happening on DW pcie IP if the bus scan tries to access an >> non-existent device. That's why it silently ignores this error. > > That sounds rather dangerous, the driver should probably check for > the particular condition it tries to avoid and print a debug message > in that case, or halt the machine if finds any unknown error, to > prevent propagation of incorrect data. > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 11:50 ` shiv prakash Agarwal @ 2014-05-30 17:21 ` Alexander Duyck 2014-05-30 18:02 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Alexander Duyck @ 2014-05-30 17:21 UTC (permalink / raw) To: linux-arm-kernel On 05/30/2014 04:50 AM, shiv prakash Agarwal wrote: > Thanks all, > > Finally we see that this hang occurs because some VDM is sent by this I210 card. > Why this card sends VDM? and how can we disable it? I'm not sure what you mean by VDM? Are you referring to the AER error message that is sent by the part? If so I believe this is being sent because the I210 is either misconfigured or because the platform is violating PCIe spec in some way that is triggering the device to send an error message. Remember the key bit in all of this is the status of the device before you load the driver: > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- That DLP+ indicates an error occurred and that the device will send an error message as soon as the bus mastering is enabled. One thing I would recommend trying is clearing the UEsta and UESvrt bits so that they all read as 0, or - in the lspci dump. Then you might try resetting the part via the sysfs reset control and verify that those bits are still cleared. However at this point it seems like this platform you are running the part in has some PCIe issues and that is beyond the scope of what we can really debug from the driver and OS stack. To resolve it you would likely need a PCIe protocol analyzer so you could see what the DLP error actually was. Thanks, Alex ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 17:21 ` Alexander Duyck @ 2014-05-30 18:02 ` shiv prakash Agarwal 2014-05-30 19:18 ` Jason Gunthorpe 0 siblings, 1 reply; 28+ messages in thread From: shiv prakash Agarwal @ 2014-05-30 18:02 UTC (permalink / raw) To: linux-arm-kernel VDM = Venodor Defined Message On Fri, May 30, 2014 at 10:51 PM, Alexander Duyck <alexander.h.duyck@intel.com> wrote: > On 05/30/2014 04:50 AM, shiv prakash Agarwal wrote: >> Thanks all, >> >> Finally we see that this hang occurs because some VDM is sent by this I210 card. >> Why this card sends VDM? and how can we disable it? > > > I'm not sure what you mean by VDM? Are you referring to the AER error > message that is sent by the part? If so I believe this is being sent > because the I210 is either misconfigured or because the platform is > violating PCIe spec in some way that is triggering the device to send an > error message. Remember the key bit in all of this is the status of the > device before you load the driver: > >> Capabilities: [100 v2] Advanced Error Reporting >> UESta: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr- >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > > That DLP+ indicates an error occurred and that the device will send an > error message as soon as the bus mastering is enabled. > > One thing I would recommend trying is clearing the UEsta and UESvrt bits > so that they all read as 0, or - in the lspci dump. Then you might try > resetting the part via the sysfs reset control and verify that those > bits are still cleared. However at this point it seems like this > platform you are running the part in has some PCIe issues and that is > beyond the scope of what we can really debug from the driver and OS > stack. To resolve it you would likely need a PCIe protocol analyzer so > you could see what the DLP error actually was. > > Thanks, > > Alex > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 18:02 ` shiv prakash Agarwal @ 2014-05-30 19:18 ` Jason Gunthorpe 2014-05-30 19:35 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Jason Gunthorpe @ 2014-05-30 19:18 UTC (permalink / raw) To: linux-arm-kernel On Fri, May 30, 2014 at 11:32:34PM +0530, shiv prakash Agarwal wrote: > Finally we see that this hang occurs because some VDM is sent by > this I210 card. Why this card sends VDM? and how can we disable it? > VDM = Venodor Defined Message FWIW, when I last looked at Intel stuff in an analyzer it was sending regular VDMs for some purpose. Type 1 VDMs should not cause any errors, the root port should just silently discard them. If your root port is returning an error completion or otherwise from a type 1 VDM then it is broken and the device would be properly asserting DLP. Some work around would be required for that kind of HW defect :| I once tracked down a similar bug with VDM handling in a PCI-E device.. Jason ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 19:18 ` Jason Gunthorpe @ 2014-05-30 19:35 ` shiv prakash Agarwal 2014-05-30 19:56 ` Arnd Bergmann 0 siblings, 1 reply; 28+ messages in thread From: shiv prakash Agarwal @ 2014-05-30 19:35 UTC (permalink / raw) To: linux-arm-kernel Thanks Jason, Is there a way to disable sending VDM by Intel card? And what is the purpose of sending type 1 VDM if it has to be discarded by root port anyway? On Sat, May 31, 2014 at 12:48 AM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Fri, May 30, 2014 at 11:32:34PM +0530, shiv prakash Agarwal wrote: >> Finally we see that this hang occurs because some VDM is sent by >> this I210 card. Why this card sends VDM? and how can we disable it? > >> VDM = Venodor Defined Message > > FWIW, when I last looked at Intel stuff in an analyzer it was sending > regular VDMs for some purpose. > > Type 1 VDMs should not cause any errors, the root port should just > silently discard them. > > If your root port is returning an error completion or otherwise from a > type 1 VDM then it is broken and the device would be properly > asserting DLP. Some work around would be required for that kind of HW > defect :| > > I once tracked down a similar bug with VDM handling in a PCI-E > device.. > > Jason ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 19:35 ` shiv prakash Agarwal @ 2014-05-30 19:56 ` Arnd Bergmann 2014-05-30 20:14 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-05-30 19:56 UTC (permalink / raw) To: linux-arm-kernel On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote: > Thanks Jason, > > Is there a way to disable sending VDM by Intel card? > And what is the purpose of sending type 1 VDM if it has to be > discarded by root port anyway? I think you should really just disable the behavior at the root port. Which host driver are you using (sorry if I forgot and you already mentioned it)? Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 19:56 ` Arnd Bergmann @ 2014-05-30 20:14 ` shiv prakash Agarwal 2014-05-30 21:11 ` Fujinaka, Todd 2014-05-31 18:34 ` Arnd Bergmann 0 siblings, 2 replies; 28+ messages in thread From: shiv prakash Agarwal @ 2014-05-30 20:14 UTC (permalink / raw) To: linux-arm-kernel Notsure about root port, Can't it be disabled by Intel device? On Sat, May 31, 2014 at 1:26 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote: >> Thanks Jason, >> >> Is there a way to disable sending VDM by Intel card? >> And what is the purpose of sending type 1 VDM if it has to be >> discarded by root port anyway? > > I think you should really just disable the behavior at the root port. > Which host driver are you using (sorry if I forgot and you already mentioned > it)? > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 20:14 ` shiv prakash Agarwal @ 2014-05-30 21:11 ` Fujinaka, Todd 2014-05-31 18:34 ` Arnd Bergmann 1 sibling, 0 replies; 28+ messages in thread From: Fujinaka, Todd @ 2014-05-30 21:11 UTC (permalink / raw) To: linux-arm-kernel It just took me one Google search and I found that VDM is being used for MCTP. Searching the datasheet for the i210 (that I'm sure I suggested you consult) discusses MCTP in section 10.7. I can't see there is a way to turn off MCTP. Todd Fujinaka Software Application Engineer Networking Division (ND) Intel Corporation todd.fujinaka at intel.com (503) 712-4565 -----Original Message----- From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com] Sent: Friday, May 30, 2014 1:14 PM To: Arnd Bergmann Cc: linux-arm-kernel at lists.infradead.org; Jason Gunthorpe; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Fujinaka, Todd; Lucas Stach Subject: Re: [E1000-devel] ARM support for igb driver Notsure about root port, Can't it be disabled by Intel device? On Sat, May 31, 2014 at 1:26 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Saturday 31 May 2014 01:05:29 shiv prakash Agarwal wrote: >> Thanks Jason, >> >> Is there a way to disable sending VDM by Intel card? >> And what is the purpose of sending type 1 VDM if it has to be >> discarded by root port anyway? > > I think you should really just disable the behavior at the root port. > Which host driver are you using (sorry if I forgot and you already > mentioned it)? > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-30 20:14 ` shiv prakash Agarwal 2014-05-30 21:11 ` Fujinaka, Todd @ 2014-05-31 18:34 ` Arnd Bergmann 2014-06-01 5:56 ` shiv prakash Agarwal 1 sibling, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-05-31 18:34 UTC (permalink / raw) To: linux-arm-kernel On Saturday 31 May 2014 01:44:28 shiv prakash Agarwal wrote: > Notsure about root port, Can't it be disabled by Intel device? My point is that it's the wrong place to disable it: every device is allowed to generate this type of VDMs, and the root port is supposed to silently ignore them if it doesn't handle them. If the root port doesn't do that, it's a bug in the host bridge driver, not in some device driver that happens to operate a device within the specification. Which host bridge driver do you use? Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-31 18:34 ` Arnd Bergmann @ 2014-06-01 5:56 ` shiv prakash Agarwal 2014-06-01 11:46 ` Arnd Bergmann 0 siblings, 1 reply; 28+ messages in thread From: shiv prakash Agarwal @ 2014-06-01 5:56 UTC (permalink / raw) To: linux-arm-kernel I don't see all devices send VDMs, then why Intel I-210? Also, Is it a bug in host bridge hardware or driver? If hardware, how can we make device not to send it? On Sun, Jun 1, 2014 at 12:04 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Saturday 31 May 2014 01:44:28 shiv prakash Agarwal wrote: >> Notsure about root port, Can't it be disabled by Intel device? > > My point is that it's the wrong place to disable it: every device > is allowed to generate this type of VDMs, and the root port is > supposed to silently ignore them if it doesn't handle them. > > If the root port doesn't do that, it's a bug in the host bridge > driver, not in some device driver that happens to operate a > device within the specification. > > Which host bridge driver do you use? > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-01 5:56 ` shiv prakash Agarwal @ 2014-06-01 11:46 ` Arnd Bergmann 2014-06-02 4:53 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-06-01 11:46 UTC (permalink / raw) To: linux-arm-kernel On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote: > I don't see all devices send VDMs, then why Intel I-210? There is no obligation to do it of course. > Also, Is it a bug in host bridge hardware or driver? If hardware, how > can we make device not to send it? If the hardware cannot handle them, it's a hardware bug. If the hardware does handle them correctly but the software doesn't, that is a bug in the bridge driver. We have a couple of host bridge drivers that register a trap handler and then look at the bridge registers to determine the exact cause. Which host bridge driver do you use? Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-01 11:46 ` Arnd Bergmann @ 2014-06-02 4:53 ` shiv prakash Agarwal 2014-06-02 16:05 ` Fujinaka, Todd 0 siblings, 1 reply; 28+ messages in thread From: shiv prakash Agarwal @ 2014-06-02 4:53 UTC (permalink / raw) To: linux-arm-kernel Yes its hardware bug. I need to know whether we can disable it from device side? If yes, how? On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote: >> I don't see all devices send VDMs, then why Intel I-210? > > There is no obligation to do it of course. > >> Also, Is it a bug in host bridge hardware or driver? If hardware, how >> can we make device not to send it? > > If the hardware cannot handle them, it's a hardware bug. If the hardware > does handle them correctly but the software doesn't, that is a bug in > the bridge driver. > > We have a couple of host bridge drivers that register a trap handler > and then look at the bridge registers to determine the exact cause. > > Which host bridge driver do you use? > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-02 4:53 ` shiv prakash Agarwal @ 2014-06-02 16:05 ` Fujinaka, Todd 2014-06-03 12:33 ` shiv prakash Agarwal 0 siblings, 1 reply; 28+ messages in thread From: Fujinaka, Todd @ 2014-06-02 16:05 UTC (permalink / raw) To: linux-arm-kernel There is no hardware bug. The PCIe spec allows VDMs. Note Section 2.2.8.6 where there appear to be a couple of options. - (Receivers) Completers silently discard Vendor_Defined Type 1 Messages which they are not designed to receive ? this is not an error condition. - (Receivers) Completers handle the receipt of an unsupported Vendor_Defined Type 0 Message as an Unsupported Request, and the error is reported according to Section 6.2. I think you may have MCTP enabled and you should be able to disable it in the EEPROM. I will need a lot more information about your system and whether the i210 is a LOM (LAD-on-motherboard, soldered onto your motherboard) or a NIC (what we call a plug-in PCIe card). Either way, you probably won't be able to get it changed without a working OS. If it's a NIC, you can take it out and put it in a non-ARM Linux system and send me a dump of your current EEPROM. Todd Fujinaka Software Application Engineer Networking Division (ND) Intel Corporation todd.fujinaka at intel.com (503) 712-4565 -----Original Message----- From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com] Sent: Sunday, June 01, 2014 9:54 PM To: Arnd Bergmann Cc: linux-arm-kernel at lists.infradead.org; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Jason Gunthorpe; Fujinaka, Todd; Lucas Stach Subject: Re: [E1000-devel] ARM support for igb driver Yes its hardware bug. I need to know whether we can disable it from device side? If yes, how? On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote: >> I don't see all devices send VDMs, then why Intel I-210? > > There is no obligation to do it of course. > >> Also, Is it a bug in host bridge hardware or driver? If hardware, how >> can we make device not to send it? > > If the hardware cannot handle them, it's a hardware bug. If the > hardware does handle them correctly but the software doesn't, that is > a bug in the bridge driver. > > We have a couple of host bridge drivers that register a trap handler > and then look at the bridge registers to determine the exact cause. > > Which host bridge driver do you use? > > Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-02 16:05 ` Fujinaka, Todd @ 2014-06-03 12:33 ` shiv prakash Agarwal 0 siblings, 0 replies; 28+ messages in thread From: shiv prakash Agarwal @ 2014-06-03 12:33 UTC (permalink / raw) To: linux-arm-kernel Thanks, Yes it is a NIC. How to get dump of its EEPROM? On Mon, Jun 2, 2014 at 9:35 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote: > There is no hardware bug. The PCIe spec allows VDMs. Note Section 2.2.8.6 where there appear to be a couple of options. > > - (Receivers) Completers silently discard Vendor_Defined Type 1 Messages which they are not designed to receive ? this is not an error condition. > - (Receivers) Completers handle the receipt of an unsupported Vendor_Defined Type 0 Message as an Unsupported Request, and the error is reported according to Section 6.2. > > I think you may have MCTP enabled and you should be able to disable it in the EEPROM. I will need a lot more information about your system and whether the i210 is a LOM (LAD-on-motherboard, soldered onto your motherboard) or a NIC (what we call a plug-in PCIe card). Either way, you probably won't be able to get it changed without a working OS. > > If it's a NIC, you can take it out and put it in a non-ARM Linux system and send me a dump of your current EEPROM. > > Todd Fujinaka > Software Application Engineer > Networking Division (ND) > Intel Corporation > todd.fujinaka at intel.com > (503) 712-4565 > > -----Original Message----- > From: shiv prakash Agarwal [mailto:chhotu.shiv at gmail.com] > Sent: Sunday, June 01, 2014 9:54 PM > To: Arnd Bergmann > Cc: linux-arm-kernel at lists.infradead.org; Duyck, Alexander H; Thomas Petazzoni; e1000-devel at lists.sourceforge.net; Jason Gunthorpe; Fujinaka, Todd; Lucas Stach > Subject: Re: [E1000-devel] ARM support for igb driver > > Yes its hardware bug. I need to know whether we can disable it from device side? If yes, how? > > On Sun, Jun 1, 2014 at 5:16 PM, Arnd Bergmann <arnd@arndb.de> wrote: >> On Sunday 01 June 2014 11:26:57 shiv prakash Agarwal wrote: >>> I don't see all devices send VDMs, then why Intel I-210? >> >> There is no obligation to do it of course. >> >>> Also, Is it a bug in host bridge hardware or driver? If hardware, how >>> can we make device not to send it? >> >> If the hardware cannot handle them, it's a hardware bug. If the >> hardware does handle them correctly but the software doesn't, that is >> a bug in the bridge driver. >> >> We have a couple of host bridge drivers that register a trap handler >> and then look at the bridge registers to determine the exact cause. >> >> Which host bridge driver do you use? >> >> Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-05-05 21:20 ` Alexander Duyck [not found] ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com> 2014-05-06 15:18 ` Arnd Bergmann @ 2014-06-03 14:49 ` Ben Dooks 2014-06-03 15:05 ` Arnd Bergmann 2 siblings, 1 reply; 28+ messages in thread From: Ben Dooks @ 2014-06-03 14:49 UTC (permalink / raw) To: linux-arm-kernel On 05/05/14 22:20, Alexander Duyck wrote: > On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: >> Dear Alexander Duyck, >> >> On Mon, 05 May 2014 08:28:02 -0700, Alexander Duyck wrote: >> >>>> 1. So overall issue is any memory/config space access hangs(logs above) >>>> if bus master enable bit is set on IGB NIC card,this is not observed >>>> with E1000E NIC cards on same platform. >>>> >>>> 2. Above issue is repro'able on my ARM platform, not x86 ubuntu. Not >>>> sure how much its related to ARM though. >>>> >>>> 3. I saw below differences in lspci -vvv output b/w e1000e and igb, I am >>>> not sure if this has anything to do with above issue. >>>> RC config is same for both cases. >>>> >>>> IGB / E1000E >>>> >>>> Command Status: INTx+/INTx- >>>> PM Status: NoSoftRst+/NoSoftRst- >>>> DevCap: FLReset-/FLReset+ >>>> No Dev/Link2 Cap/Sta Registers for E1000E >>>> Some differences in AER Registers >>>> >>>> 4. Any idea, if this card is verified on ARM by anybody? >>>> >>> >>> It seems like you are glossing over the obvious issue. You said it >>> yourself, this works fine on x86. Therefore this is likely VERY related >>> to ARM, or at least your specific ARM platform configuration. >> >> Since I haven't seen the beginning of the thread, I might be completely >> off topic. However, I wanted to mention that I have successfully used >> and tested an IGB PCIe NIC on an ARM Armada XP platform. If that is >> useful, I'd be happy to provide you with additional details upon >> request. >> >> Best regards, >> >> Thomas >> > > > Thomas, > > Glad to hear that this is working on your ARM platform as expected. > > I believe the issue Shiv is having is due to a problem with the specific > platform as the IGB device is reporting a Data Link Protocol error via > AER and I believe this is what is causing his platform issues. On > enabling BME the device is likely signalling a Fatal Error message in > response to the DLP error. The original error he was seeing was: > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 I should sort out making these errors non-fatal to the system, there's not really much point in killing a process that may not have been the initiator of the problem. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-03 14:49 ` Ben Dooks @ 2014-06-03 15:05 ` Arnd Bergmann 2014-06-03 15:13 ` Ben Dooks 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-06-03 15:05 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote: > On 05/05/14 22:20, Alexander Duyck wrote: > > On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: > > > > Glad to hear that this is working on your ARM platform as expected. > > > > I believe the issue Shiv is having is due to a problem with the specific > > platform as the IGB device is reporting a Data Link Protocol error via > > AER and I believe this is what is causing his platform issues. On > > enabling BME the device is likely signalling a Fatal Error message in > > response to the DLP error. The original error he was seeing was: > > > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 > > I should sort out making these errors non-fatal to the system, there's > not really much point in killing a process that may not have been the > initiator of the problem. We really shouldn't catch those errors system-wide, it belongs into the specific host bridge driver, but Shiv refuses to say which one that is, so we can't fix it. Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-03 15:05 ` Arnd Bergmann @ 2014-06-03 15:13 ` Ben Dooks 2014-06-03 15:23 ` Arnd Bergmann 0 siblings, 1 reply; 28+ messages in thread From: Ben Dooks @ 2014-06-03 15:13 UTC (permalink / raw) To: linux-arm-kernel On 03/06/14 16:05, Arnd Bergmann wrote: > On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote: >> On 05/05/14 22:20, Alexander Duyck wrote: >>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: >>> >>> Glad to hear that this is working on your ARM platform as expected. >>> >>> I believe the issue Shiv is having is due to a problem with the specific >>> platform as the IGB device is reporting a Data Link Protocol error via >>> AER and I believe this is what is causing his platform issues. On >>> enabling BME the device is likely signalling a Fatal Error message in >>> response to the DLP error. The original error he was seeing was: >>> >>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000 >> >> I should sort out making these errors non-fatal to the system, there's >> not really much point in killing a process that may not have been the >> initiator of the problem. > > We really shouldn't catch those errors system-wide, it belongs into > the specific host bridge driver, but Shiv refuses to say which one that > is, so we can't fix it. I am not sure what else we can do, either we either have to have a default null handler, or log that they have happened. The whole issue with the "imprecise external" part is that you have no idea what instruction (or core) caused the issue and IIRC there is very little information about what actually sent the abort. I believe these are useful to report as they tend to show that some part of the system has gone wrong. For example, we get them on the rcar-h2 if the system tries to access a unit that has not been properly clocked. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-03 15:13 ` Ben Dooks @ 2014-06-03 15:23 ` Arnd Bergmann 2014-06-03 15:31 ` Ben Dooks 0 siblings, 1 reply; 28+ messages in thread From: Arnd Bergmann @ 2014-06-03 15:23 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 03 June 2014 16:13:07 Ben Dooks wrote: > On 03/06/14 16:05, Arnd Bergmann wrote: > > On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote: > >> On 05/05/14 22:20, Alexander Duyck wrote: > >>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: > >>> > >>> Glad to hear that this is working on your ARM platform as expected. > >>> > >>> I believe the issue Shiv is having is due to a problem with the specific > >>> platform as the IGB device is reporting a Data Link Protocol error via > >>> AER and I believe this is what is causing his platform issues. On > >>> enabling BME the device is likely signalling a Fatal Error message in > >>> response to the DLP error. The original error he was seeing was: > >>> > >>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000 > >> > >> I should sort out making these errors non-fatal to the system, there's > >> not really much point in killing a process that may not have been the > >> initiator of the problem. > > > > We really shouldn't catch those errors system-wide, it belongs into > > the specific host bridge driver, but Shiv refuses to say which one that > > is, so we can't fix it. > > I am not sure what else we can do, either we either have to have a > default null handler, or log that they have happened. > > The whole issue with the "imprecise external" part is that you have > no idea what instruction (or core) caused the issue and IIRC there is > very little information about what actually sent the abort. > > I believe these are useful to report as they tend to show that some > part of the system has gone wrong. For example, we get them on the > rcar-h2 if the system tries to access a unit that has not been > properly clocked. In my experience, any unit that can send such an abort also has a diagnotic register that you can look into to find out at least the unit that triggered it so you can disable it. If none of the known sources caused the abort, it's generally best to shut down the system to avoid further data corruption. Arnd ^ permalink raw reply [flat|nested] 28+ messages in thread
* [E1000-devel] ARM support for igb driver 2014-06-03 15:23 ` Arnd Bergmann @ 2014-06-03 15:31 ` Ben Dooks 0 siblings, 0 replies; 28+ messages in thread From: Ben Dooks @ 2014-06-03 15:31 UTC (permalink / raw) To: linux-arm-kernel On 03/06/14 16:23, Arnd Bergmann wrote: > On Tuesday 03 June 2014 16:13:07 Ben Dooks wrote: >> On 03/06/14 16:05, Arnd Bergmann wrote: >>> On Tuesday 03 June 2014 15:49:13 Ben Dooks wrote: >>>> On 05/05/14 22:20, Alexander Duyck wrote: >>>>> On 05/05/2014 01:38 PM, Thomas Petazzoni wrote: >>>>> >>>>> Glad to hear that this is working on your ARM platform as expected. >>>>> >>>>> I believe the issue Shiv is having is due to a problem with the specific >>>>> platform as the IGB device is reporting a Data Link Protocol error via >>>>> AER and I believe this is what is causing his platform issues. On >>>>> enabling BME the device is likely signalling a Fatal Error message in >>>>> response to the DLP error. The original error he was seeing was: >>>>> >>>>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000 >>>> >>>> I should sort out making these errors non-fatal to the system, there's >>>> not really much point in killing a process that may not have been the >>>> initiator of the problem. >>> >>> We really shouldn't catch those errors system-wide, it belongs into >>> the specific host bridge driver, but Shiv refuses to say which one that >>> is, so we can't fix it. >> >> I am not sure what else we can do, either we either have to have a >> default null handler, or log that they have happened. >> >> The whole issue with the "imprecise external" part is that you have >> no idea what instruction (or core) caused the issue and IIRC there is >> very little information about what actually sent the abort. >> >> I believe these are useful to report as they tend to show that some >> part of the system has gone wrong. For example, we get them on the >> rcar-h2 if the system tries to access a unit that has not been >> properly clocked. > > In my experience, any unit that can send such an abort also has > a diagnotic register that you can look into to find out at least the > unit that triggered it so you can disable it. Not on the rcar-h2. -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-06-03 15:31 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAHH3p5KAwyMOY=ai_h0eM+JwYLtTO8d4jVaVOWVtOakhqPRTew@mail.gmail.com> [not found] ` <CF7EA37A.4676A%matthew.vick@intel.com> [not found] ` <CAHH3p5JFc3jgeK7OFd1rtgDS3gjRdjiPr3LwwqfyKMPGSTwMWA@mail.gmail.com> [not found] ` <CF7FD060.46A87%matthew.vick@intel.com> [not found] ` <CAHH3p5J8V6uWEzbCoT+x-6_2x1q=nc=_oMn0qnnj113FmeuW6A@mail.gmail.com> [not found] ` <9B4A1B1917080E46B64F07F2989DADD6533627A5@ORSMSX114.amr.corp.intel.com> [not found] ` <CAHH3p5KEPXw5mxP-13evd774nyO5C13ky0=fCaZPCNVx1gh3Ow@mail.gmail.com> [not found] ` <CAHH3p5LaRwgdECxsO4OZSdWEhGt232yj-3347kYEStP6NLg2xg@mail.gmail.com> [not found] ` <5360041A.7030005@intel.com> [not found] ` <CAHH3p5L6WM3FaZ19Tw9vUcsk+kERfBKnGC2BRtnftJR1pbSF7g@mail.gmail.com> 2014-05-05 15:28 ` [E1000-devel] ARM support for igb driver Alexander Duyck [not found] ` <CAHH3p5KEbvziVxg-9o45-chp=6U2JnW=ffk1p85GKFyWXiX8CQ@mail.gmail.com> 2014-05-05 20:00 ` Alexander Duyck 2014-05-05 20:38 ` Thomas Petazzoni 2014-05-05 21:20 ` Alexander Duyck [not found] ` <CAHH3p5L4DaORYJ8_6zMEY3ikEnABU1L0B2C6Fezb_g6dCjtr5Q@mail.gmail.com> 2014-05-06 14:58 ` Alexander Duyck 2014-06-03 14:57 ` Ben Dooks 2014-05-06 15:18 ` Arnd Bergmann 2014-05-06 15:24 ` Lucas Stach 2014-05-06 15:33 ` Arnd Bergmann 2014-05-30 11:50 ` shiv prakash Agarwal 2014-05-30 17:21 ` Alexander Duyck 2014-05-30 18:02 ` shiv prakash Agarwal 2014-05-30 19:18 ` Jason Gunthorpe 2014-05-30 19:35 ` shiv prakash Agarwal 2014-05-30 19:56 ` Arnd Bergmann 2014-05-30 20:14 ` shiv prakash Agarwal 2014-05-30 21:11 ` Fujinaka, Todd 2014-05-31 18:34 ` Arnd Bergmann 2014-06-01 5:56 ` shiv prakash Agarwal 2014-06-01 11:46 ` Arnd Bergmann 2014-06-02 4:53 ` shiv prakash Agarwal 2014-06-02 16:05 ` Fujinaka, Todd 2014-06-03 12:33 ` shiv prakash Agarwal 2014-06-03 14:49 ` Ben Dooks 2014-06-03 15:05 ` Arnd Bergmann 2014-06-03 15:13 ` Ben Dooks 2014-06-03 15:23 ` Arnd Bergmann 2014-06-03 15:31 ` Ben Dooks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).