All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Hillier, Gernot" <gernot.hillier@siemens.com>
To: Krzysztof Halasa <khc@pm.waw.pl>
Cc: jesse.brandeburg@intel.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, bruce.w.allan@intel.com
Subject: Re: e1000e: sporadic "hardware error"s with Intel 82563EB on Supermicro X7DB3
Date: Wed, 08 Oct 2008 15:35:35 +0200	[thread overview]
Message-ID: <48ECB727.6050905@siemens.com> (raw)
In-Reply-To: <m33aj78jnh.fsf@maximus.localdomain>

Hello!

Krzysztof Halasa wrote:
> Hi,
> 
> "Hillier, Gernot" <gernot.hillier@siemens.com> writes:
> 
>> On at least two machines using the Supermicro X7DB3 board with Intel
>> 82563EB (a.k.a. PCI device 8086:1096), we see sporadic problems on modprobe
>> (about 1 time in some hundred tries):
>>
>> e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
>> e1000e: Copyright (c) 1999-2008 Intel Corporation.
>> e1000e 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
>> e1000e 0000:06:00.0: setting latency timer to 64
>> 0000:06:00.0: 0000:06:00.0: Hardware Error
> 
> What does "lspci -vv" say about it when the above happens?
> 
> I spurious chip reset (hardware) could probably cause that.

Here's the output of "lspci -vv" in the error case (for the eth devices):

------- SNIP -----------
06:00.0 Class 0200: Device 8086:1096 (rev 01)
        Subsystem: Device 15d9:1096
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at d0020000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at d0000000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at 4000 [size=32]
        [virtual] Expansion ROM at d0080000 [disabled] [size=64K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 00000000feeff00c  Data: 4158
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM unknown, Latency L0 <128ns, L1 <64us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [140] Device Serial Number 06-c7-66-ff-ff-48-30-00
        Kernel driver in use: e1000e
        Kernel modules: e1000e

06:00.1 Class 0200: Device 8086:1096 (rev 01)
        Subsystem: Device 15d9:1096
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin B routed to IRQ 19
        Region 0: Memory at d0060000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at d0040000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at 4020 [size=32]
        [virtual] Expansion ROM at d0090000 [disabled] [size=64K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM unknown, Latency L0 <128ns, L1 <64us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [140] Device Serial Number 06-c7-66-ff-ff-48-30-00
        Kernel driver in use: e1000e
        Kernel modules: e1000e
------- SNIP -----------

Retried this several times in the error and normal case. The only things
which change are three values for device 06:00.0:

- Control "DisINTx-" changes to "DisINTx+" if the card is correctly
initialized
- Interrupt changes from IRQ 18 to IRQ 4345 if card is correctly initialized
- Message Signalled Interrupts change from "Enable-" to "Enable+"

In addition, the "Data" field from "Message Signalled Interrupts" seems to 
change w/o any clear pattern.

For 06:00.1, everything seems to be the same in the error as well as in the
normal case.

Does this tell you anything valuable?

-- 
Gernot Hillier, Siemens AG, CT SE 2

  reply	other threads:[~2008-10-08 13:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-07 14:25 e1000e: sporadic "hardware error"s with Intel 82563EB on Supermicro X7DB3 Hillier, Gernot
2008-10-08 10:29 ` Krzysztof Halasa
2008-10-08 13:35   ` Hillier, Gernot [this message]
2008-10-08 22:03     ` Krzysztof Halasa
     [not found] <EA929A9653AAE14F841771FB1DE5A1365F498F4EDC@rrsmsx501.amr.corp.intel.com>
2008-10-08 15:25 ` Graham, David
2008-10-08 21:36   ` Stephen Hemminger
2008-10-09 13:18   ` Hillier, Gernot
2008-10-14  9:18     ` Gernot Hillier
2008-10-15 16:37       ` Graham, David
2008-10-16 12:32         ` Hillier, Gernot
2008-10-16 16:07         ` Hillier, Gernot
2008-11-11 10:05           ` Hillier, Gernot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48ECB727.6050905@siemens.com \
    --to=gernot.hillier@siemens.com \
    --cc=bruce.w.allan@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=khc@pm.waw.pl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.