public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* e100 PCI bridge problem
@ 2007-07-13 17:37 William Montgomery
  2007-07-13 20:36 ` Kok, Auke
  2007-07-14 14:43 ` Krzysztof Halasa
  0 siblings, 2 replies; 14+ messages in thread
From: William Montgomery @ 2007-07-13 17:37 UTC (permalink / raw)
  To: linux-kernel


In an earlier post to the list I described a hard lockup condition
that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
and occurs on 2 out of 3 computers.

Further testing has revealed that the lockup can be prevented on all
computers by making sure the card is installed on the primary PCI bus.
If the card is installed in a slot on the secondary PCI bus (behind a
PCI to PCI bridge) the lockup occurs.

Are there any PCI tuning registers that I can tweak to get around
this problem?  Any changes I could make to the e100 driver to fix this?

Any help appreciated.

Regards,
Wm


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-13 17:37 e100 PCI bridge problem William Montgomery
@ 2007-07-13 20:36 ` Kok, Auke
  2007-07-13 22:30   ` William Montgomery
  2007-07-14 14:43 ` Krzysztof Halasa
  1 sibling, 1 reply; 14+ messages in thread
From: Kok, Auke @ 2007-07-13 20:36 UTC (permalink / raw)
  To: William Montgomery; +Cc: linux-kernel

William Montgomery wrote:
> In an earlier post to the list I described a hard lockup condition
> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
> a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
> and occurs on 2 out of 3 computers.
> 
> Further testing has revealed that the lockup can be prevented on all
> computers by making sure the card is installed on the primary PCI bus.
> If the card is installed in a slot on the secondary PCI bus (behind a
> PCI to PCI bridge) the lockup occurs.

sounds like int-A/B/C/D routing issues

> Are there any PCI tuning registers that I can tweak to get around
> this problem?  Any changes I could make to the e100 driver to fix this?

this issue might be resolvable by quirking the bridgee chips and adjusting any 
APIC where needed. Unfortunately I don't know much about this but it's 
physically not possible from the e100 driver. The special (non-intel) card that 
has these 4 ports onboard contains a bridge chip itself which explains the 
issues. Even a BIOS issue could be the cause here.

Perhaps the linuxfirmwarekit will reveal more information. In any case, fixing 
this in software would be a gigantic effort.

Auke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-13 20:36 ` Kok, Auke
@ 2007-07-13 22:30   ` William Montgomery
  2007-07-13 22:41     ` Kok, Auke
  0 siblings, 1 reply; 14+ messages in thread
From: William Montgomery @ 2007-07-13 22:30 UTC (permalink / raw)
  To: Kok, Auke; +Cc: linux-kernel

Thanks for responding.  I am very interested to find the source of this 
problem.

Kok, Auke wrote:

> William Montgomery wrote:
>
>> In an earlier post to the list I described a hard lockup condition
>> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
>> a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
>> and occurs on 2 out of 3 computers.
>>
>> Further testing has revealed that the lockup can be prevented on all
>> computers by making sure the card is installed on the primary PCI bus.
>> If the card is installed in a slot on the secondary PCI bus (behind a
>> PCI to PCI bridge) the lockup occurs.
>
>
> sounds like int-A/B/C/D routing issues

The strange thing is that all the ports on the card work fine for a few 
minutes, then when some condition (as yet unknown) occurs the system 
locks up hard.  I am currently using a PCI bus analyzer to capture bus 
activity just prior to the lockup to try and find out what leads up to 
this condition.

>
>> Are there any PCI tuning registers that I can tweak to get around
>> this problem?  Any changes I could make to the e100 driver to fix this?
>
>
> this issue might be resolvable by quirking the bridgee chips and 
> adjusting any APIC where needed. Unfortunately I don't know much about 
> this but it's physically not possible from the e100 driver. The 
> special (non-intel) card that has these 4 ports onboard contains a 
> bridge chip itself which explains the issues. Even a BIOS issue could 
> be the cause here.

I am aware of the bridge chip on the card but not sure what you mean 
when you say this explains the issues?  I sure would like to figure out 
a way around this.

The PCI info follows:
  00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
Controller/Host-Hub Interface (rev 03)
  00:02.0 VGA compatible controller: Intel Corp. 
82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03)
  00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 02)
  00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 02)
  00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 02)
  00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI 
Controller (rev 02)
  00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to 
PCI Bridge (rev 82)
  00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 02)
  00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 
Storage Controller (rev 02)
  00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 02)
  01:08.0 Ethernet controller: Intel Corp. 82801BD PRO/100 VE (CNR) 
Ethernet Controller (rev 82)
  01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02)
  02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge 
(non-transparent mode) (rev 15)
  03:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
  03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
  03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
  03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)

>
> Perhaps the linuxfirmwarekit will reveal more information. In any 
> case, fixing this in software would be a gigantic effort.
>
I will look into that on Monday and report what I find.  It seems like 
it is premature to say how much effort the fix will take since the 
problem is not yet known?  At least not known to me yet.  I would just 
like to find out what parameters on the bridge/bridges might affect this 
problem and how to modify them.

> Auke
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-13 22:30   ` William Montgomery
@ 2007-07-13 22:41     ` Kok, Auke
  2007-07-14  0:54       ` William Montgomery
  0 siblings, 1 reply; 14+ messages in thread
From: Kok, Auke @ 2007-07-13 22:41 UTC (permalink / raw)
  To: William Montgomery; +Cc: linux-kernel

William Montgomery wrote:
> Thanks for responding.  I am very interested to find the source of this 
> problem.
> 
> Kok, Auke wrote:
> 
>> William Montgomery wrote:
>>
>>> In an earlier post to the list I described a hard lockup condition
>>> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
>>> a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
>>> and occurs on 2 out of 3 computers.
>>>
>>> Further testing has revealed that the lockup can be prevented on all
>>> computers by making sure the card is installed on the primary PCI bus.
>>> If the card is installed in a slot on the secondary PCI bus (behind a
>>> PCI to PCI bridge) the lockup occurs.
>>
>> sounds like int-A/B/C/D routing issues
> 
> The strange thing is that all the ports on the card work fine for a few 
> minutes, then when some condition (as yet unknown) occurs the system 
> locks up hard.  I am currently using a PCI bus analyzer to capture bus 
> activity just prior to the lockup to try and find out what leads up to 
> this condition.

are you running any form of irqbalance, either in-kernel (bad) or the userspace 
(better) one?

>>> Are there any PCI tuning registers that I can tweak to get around
>>> this problem?  Any changes I could make to the e100 driver to fix this?
>>
>> this issue might be resolvable by quirking the bridgee chips and 
>> adjusting any APIC where needed. Unfortunately I don't know much about 
>> this but it's physically not possible from the e100 driver. The 
>> special (non-intel) card that has these 4 ports onboard contains a 
>> bridge chip itself which explains the issues. Even a BIOS issue could 
>> be the cause here.
> 
> I am aware of the bridge chip on the card but not sure what you mean 
> when you say this explains the issues?  I sure would like to figure out 
> a way around this.

irq routing in linux may not be the same as in windows. I have no idea how to 
compare them either (dmesg will show the linux setup, but I don't know how to 
retreive this info under linux).

> The PCI info follows:
>   00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
> Controller/Host-Hub Interface (rev 03)
>   00:02.0 VGA compatible controller: Intel Corp. 
> 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03)
>   00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 02)
>   00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 02)
>   00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 02)
>   00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI 
> Controller (rev 02)
>   00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to 
> PCI Bridge (rev 82)
>   00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 02)
>   00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 
> Storage Controller (rev 02)
>   00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 02)
>   01:08.0 Ethernet controller: Intel Corp. 82801BD PRO/100 VE (CNR) 
> Ethernet Controller (rev 82)
>   01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02)
>   02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge 
> (non-transparent mode) (rev 15)
>   03:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
> (rev 08)
>   03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
> (rev 08)
>   03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
> (rev 08)
>   03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
> (rev 08)
> 
>> Perhaps the linuxfirmwarekit will reveal more information. In any 
>> case, fixing this in software would be a gigantic effort.
>>
> I will look into that on Monday and report what I find.  It seems like 
> it is premature to say how much effort the fix will take since the 
> problem is not yet known?  At least not known to me yet.  I would just 
> like to find out what parameters on the bridge/bridges might affect this 
> problem and how to modify them.

I personally have no idea and am not knowledgeable enough on this issue, sorry :)

Auke

> 
>> Auke
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-13 22:41     ` Kok, Auke
@ 2007-07-14  0:54       ` William Montgomery
  0 siblings, 0 replies; 14+ messages in thread
From: William Montgomery @ 2007-07-14  0:54 UTC (permalink / raw)
  To: Kok, Auke; +Cc: linux-kernel

Kok, Auke wrote:

> William Montgomery wrote:
>
>> Thanks for responding.  I am very interested to find the source of 
>> this problem.
>>
>> Kok, Auke wrote:
>>
>>> William Montgomery wrote:
>>>
>>>> In an earlier post to the list I described a hard lockup condition
>>>> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
>>>> a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
>>>> and occurs on 2 out of 3 computers.
>>>>
>>>> Further testing has revealed that the lockup can be prevented on all
>>>> computers by making sure the card is installed on the primary PCI bus.
>>>> If the card is installed in a slot on the secondary PCI bus (behind a
>>>> PCI to PCI bridge) the lockup occurs.
>>>
>>>
>>> sounds like int-A/B/C/D routing issues
>>
>>
>> The strange thing is that all the ports on the card work fine for a 
>> few minutes, then when some condition (as yet unknown) occurs the 
>> system locks up hard.  I am currently using a PCI bus analyzer to 
>> capture bus activity just prior to the lockup to try and find out 
>> what leads up to this condition.
>
>
> are you running any form of irqbalance, either in-kernel (bad) or the 
> userspace (better) one?

No.  This is a Pentium 4 - single core, 2.8GHz.

>
>>>> Are there any PCI tuning registers that I can tweak to get around
>>>> this problem?  Any changes I could make to the e100 driver to fix 
>>>> this?
>>>
>>>
>>> this issue might be resolvable by quirking the bridgee chips and 
>>> adjusting any APIC where needed. Unfortunately I don't know much 
>>> about this but it's physically not possible from the e100 driver. 
>>> The special (non-intel) card that has these 4 ports onboard contains 
>>> a bridge chip itself which explains the issues. Even a BIOS issue 
>>> could be the cause here.
>>
>>
>> I am aware of the bridge chip on the card but not sure what you mean 
>> when you say this explains the issues?  I sure would like to figure 
>> out a way around this.
>
>
> irq routing in linux may not be the same as in windows. I have no idea 
> how to compare them either (dmesg will show the linux setup, but I 
> don't know how to retreive this info under linux).

Not sure how windows applies here; I only use Linux.  The main data 
point so far is that the card works fine when on the primary PCI bus but 
locks up hard after a few minutes when installed in a slot behind a PCI 
to PCI bridge.  I can provide the dmesg info on Monday.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-13 17:37 e100 PCI bridge problem William Montgomery
  2007-07-13 20:36 ` Kok, Auke
@ 2007-07-14 14:43 ` Krzysztof Halasa
  2007-07-14 23:17   ` William Montgomery
  1 sibling, 1 reply; 14+ messages in thread
From: Krzysztof Halasa @ 2007-07-14 14:43 UTC (permalink / raw)
  To: William Montgomery; +Cc: linux-kernel

William Montgomery <william@opinicus.com> writes:

> In an earlier post to the list I described a hard lockup condition
> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
> a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
> and occurs on 2 out of 3 computers.
>
> Further testing has revealed that the lockup can be prevented on all
> computers by making sure the card is installed on the primary PCI bus.
> If the card is installed in a slot on the secondary PCI bus (behind a
> PCI to PCI bridge) the lockup occurs.

Does the machine #3 have a PCI slot connected to a "secondary" bus?
Have you tried with any other machine with a secondary bus?

> Are there any PCI tuning registers that I can tweak to get around
> this problem?  Any changes I could make to the e100 driver to fix this?

Could be a hardware/BIOS problem on machines #1 and #2. Could be
a Linux bug as well, though similar configurations are known to work
fine. I don't think it has anything to do with IRQs.

Perhaps it doesn't like a bridge (on the card) behind a bridge
(on the motherboard). I would test with another multiport card
such as old DLink DFE-570TX (using a DEC 21150 bridge and four
21143 Ethernet chips).

I'd probably use some PCI analyzer or, at least, I'd check
the bus state with a multimeter.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-14 14:43 ` Krzysztof Halasa
@ 2007-07-14 23:17   ` William Montgomery
  2007-07-14 23:49     ` Krzysztof Halasa
  0 siblings, 1 reply; 14+ messages in thread
From: William Montgomery @ 2007-07-14 23:17 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: linux-kernel

Krzysztof Halasa wrote:

>William Montgomery <william@opinicus.com> writes:
>
>  
>
>>In an earlier post to the list I described a hard lockup condition
>>that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using
>>a 4 port 10/100 fast ethernet card.  The lockup is easily repeatable
>>and occurs on 2 out of 3 computers.
>>
>>Further testing has revealed that the lockup can be prevented on all
>>computers by making sure the card is installed on the primary PCI bus.
>>If the card is installed in a slot on the secondary PCI bus (behind a
>>PCI to PCI bridge) the lockup occurs.
>>    
>>
>
>Does the machine #3 have a PCI slot connected to a "secondary" bus?
>Have you tried with any other machine with a secondary bus?
>
>  
>
The #3 machine doesn't have a secondary bus.  #1 and #2 are from 2 
different vendors (#1 Advantech - #2 Axiomtek) and I havent tried any 
othe machines.

>>Are there any PCI tuning registers that I can tweak to get around
>>this problem?  Any changes I could make to the e100 driver to fix this?
>>    
>>
>
>Could be a hardware/BIOS problem on machines #1 and #2. Could be
>a Linux bug as well, though similar configurations are known to work
>fine. I don't think it has anything to do with IRQs.
>
>Perhaps it doesn't like a bridge (on the card) behind a bridge
>(on the motherboard). I would test with another multiport card
>such as old DLink DFE-570TX (using a DEC 21150 bridge and four
>21143 Ethernet chips).
>
>I'd probably use some PCI analyzer or, at least, I'd check
>the bus state with a multimeter.
>  
>
The #1 and #2 machines are known to work with an older Adaptec ANA-62044 
4port NIC (tulip based) with an onboard Intel 21154 bridge chip.  The 
card I am having problems with uses an onboard Hint Corp HB6 Universal 
PCI-PCI bridge.

I am using a PCI analyzer and it shows the bus in an idle state after 
the lockup.  The PCI transactions just prior to the lockup show a couple 
of interrupts from the card which appear to be handled correctly.  
Anything I should be looking for in particular?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-14 23:17   ` William Montgomery
@ 2007-07-14 23:49     ` Krzysztof Halasa
  2007-07-15  1:27       ` William Montgomery
  2007-07-17 18:29       ` William Montgomery
  0 siblings, 2 replies; 14+ messages in thread
From: Krzysztof Halasa @ 2007-07-14 23:49 UTC (permalink / raw)
  To: William Montgomery; +Cc: linux-kernel

William Montgomery <william@opinicus.com> writes:

> I am using a PCI analyzer and it shows the bus in an idle state after
> the lockup.  The PCI transactions just prior to the lockup show a
> couple of interrupts from the card which appear to be handled
> correctly.  Anything I should be looking for in particular?

I'd try to check with other machine using "secondary" bus slot.
BTW: Are you able to analyze the "primary" bus transactions while
using the card in "secondary" bus? Perhaps there is something
wrong in front of the motherboard bridge?

A broken motherboard may be hard to diagnose, unfortunately.

Can you post something like "lspci -vv" taken on both machines?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-14 23:49     ` Krzysztof Halasa
@ 2007-07-15  1:27       ` William Montgomery
  2007-07-17 18:29       ` William Montgomery
  1 sibling, 0 replies; 14+ messages in thread
From: William Montgomery @ 2007-07-15  1:27 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: linux-kernel

Krzysztof Halasa wrote:

>William Montgomery <william@opinicus.com> writes:
>
>  
>
>>I am using a PCI analyzer and it shows the bus in an idle state after
>>the lockup.  The PCI transactions just prior to the lockup show a
>>couple of interrupts from the card which appear to be handled
>>correctly.  Anything I should be looking for in particular?
>>    
>>
>
>I'd try to check with other machine using "secondary" bus slot.
>BTW: Are you able to analyze the "primary" bus transactions while
>using the card in "secondary" bus? Perhaps there is something
>wrong in front of the motherboard bridge?
>
>A broken motherboard may be hard to diagnose, unfortunately.
>
>Can you post something like "lspci -vv" taken on both machines?
>  
>
I will post more info on Monday when I am able to power them up.

I'm not so sure the motherboard is broken, I am leaning more towards a 
misconfigured
bridge.  This computer is a 4U 19 inch rackmount chassis with a PCMIG 
CPU and a 12 slot PCI backplane.  I have done a lot of testing with this 
box trying to characterize this problem.  In one case I have put 3 Intel 
PRO 100S NICs on the secondary PCI bus and they ran under heavy stress 
test loads overnight.  The 4 port NIC seems to be the only card that 
doesnt want to cooperate.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-14 23:49     ` Krzysztof Halasa
  2007-07-15  1:27       ` William Montgomery
@ 2007-07-17 18:29       ` William Montgomery
  2007-07-17 18:55         ` Kok, Auke
  2007-07-17 20:54         ` Krzysztof Halasa
  1 sibling, 2 replies; 14+ messages in thread
From: William Montgomery @ 2007-07-17 18:29 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: linux-kernel

Krzysztof Halasa wrote:

>William Montgomery <william@opinicus.com> writes:
>
>  
>
>>I am using a PCI analyzer and it shows the bus in an idle state after
>>the lockup.  The PCI transactions just prior to the lockup show a
>>couple of interrupts from the card which appear to be handled
>>correctly.  Anything I should be looking for in particular?
>>    
>>
>
>I'd try to check with other machine using "secondary" bus slot.
>BTW: Are you able to analyze the "primary" bus transactions while
>using the card in "secondary" bus? Perhaps there is something
>wrong in front of the motherboard bridge?
>
>  
>
I am able to analyze the primary bus while the using the card in the 
secondary and I see a very interesting thing on lockup - the primary 
side appears to be stuck on a read access to the memory mapped control 
regs of the LAN chip (82559) in what appears to be infinite target 
retries to the same address.  Unfortunately I havent been able to 
capture what occurs just prior to this happening.  This is quite 
different from what I capture on the secondary side; which is an idle bus

I have posted the lspci -vv listing below...

>A broken motherboard may be hard to diagnose, unfortunately.
>
>Can you post something like "lspci -vv" taken on both machines?
>  
>
Here is the lspci -vv on the machine with lockups (edited for brevity):

00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
Controller/Ho
        Subsystem: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
Controller/Host
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Step
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
<TAbort-
        Latency: 0
        Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [e4] #09 [1105]
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI 
Bridge (rev 82) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR+
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=03, sec-latency=32
        I/O behind bridge: 00009000-0000afff
        Memory behind bridge: f4000000-f6ffffff
        Prefetchable memory behind bridge: 10000000-103fffff
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02) 
(prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Bus: primary=01, secondary=02, subordinate=03, sec-latency=32
        I/O behind bridge: 00009000-00009fff
        Memory behind bridge: f4000000-f5ffffff
        Prefetchable memory behind bridge: 0000000010000000-0000000010300000
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
        Capabilities: [dc] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [b0] Slot ID: 0 slots, First-, chassis 00

02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge 
(non-transparent mode) (rev 15) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=32
        I/O behind bridge: 00009000-00009fff
        Memory behind bridge: f4000000-f5ffffff
        Prefetchable memory behind bridge: 0000000010000000-0000000010300000
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] #06 [0080]
        Capabilities: [a0] Vital Product Data03:04.0 Ethernet 
controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter 
with Alert On LAN*
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 14000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at f5403000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at 9000 [size=64]
        Region 2: Memory at f5000000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 10000000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
        Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter 
with Alert On LAN*
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 14000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 19
        Region 0: Memory at f5401000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at 9400 [size=64]
        Region 2: Memory at f5100000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 10100000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
        Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter 
with Alert On LAN*
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 14000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f5400000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at 9800 [size=64]
        Region 2: Memory at f5200000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 10200000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] 
(rev 08)
        Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter 
with Alert On LAN*
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 14000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f5402000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at 9c00 [size=64]
        Region 2: Memory at f5300000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 10300000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

======================






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-17 18:29       ` William Montgomery
@ 2007-07-17 18:55         ` Kok, Auke
  2007-07-17 19:37           ` William Montgomery
  2007-07-17 21:04           ` Krzysztof Halasa
  2007-07-17 20:54         ` Krzysztof Halasa
  1 sibling, 2 replies; 14+ messages in thread
From: Kok, Auke @ 2007-07-17 18:55 UTC (permalink / raw)
  To: William Montgomery; +Cc: Krzysztof Halasa, linux-kernel

William Montgomery wrote:
> Krzysztof Halasa wrote:
> 
>> William Montgomery <william@opinicus.com> writes:
>>
>>  
>>
>>> I am using a PCI analyzer and it shows the bus in an idle state after
>>> the lockup.  The PCI transactions just prior to the lockup show a
>>> couple of interrupts from the card which appear to be handled
>>> correctly.  Anything I should be looking for in particular?
>>>    
>>>
>> I'd try to check with other machine using "secondary" bus slot.
>> BTW: Are you able to analyze the "primary" bus transactions while
>> using the card in "secondary" bus? Perhaps there is something
>> wrong in front of the motherboard bridge?
>>
>>  
>>
> I am able to analyze the primary bus while the using the card in the 
> secondary and I see a very interesting thing on lockup - the primary 
> side appears to be stuck on a read access to the memory mapped control 
> regs of the LAN chip (82559) in what appears to be infinite target 
> retries to the same address.  Unfortunately I havent been able to 
> capture what occurs just prior to this happening.  This is quite 
> different from what I capture on the secondary side; which is an idle bus
> 
> I have posted the lspci -vv listing below...
> 
>> A broken motherboard may be hard to diagnose, unfortunately.
>>
>> Can you post something like "lspci -vv" taken on both machines?
>>  
>>
> Here is the lspci -vv on the machine with lockups (edited for brevity):
> 
> 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
> Controller/Ho
>         Subsystem: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM 
> Controller/Host
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Step
>         Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort-
>         Latency: 0
>         Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
>         Capabilities: [e4] #09 [1105]
> 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI 
> Bridge (rev 82) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR+ FastB2B-
>         Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR+

PERR+ set... not good - this certainly will cause major issues


Auke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-17 18:55         ` Kok, Auke
@ 2007-07-17 19:37           ` William Montgomery
  2007-07-17 21:04           ` Krzysztof Halasa
  1 sibling, 0 replies; 14+ messages in thread
From: William Montgomery @ 2007-07-17 19:37 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Krzysztof Halasa, linux-kernel

Kok, Auke wrote:

>> 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to 
>> PCI Bridge (rev 82) (prog-if 00 [Normal decode])
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
>> ParErr- Stepping- SERR+ FastB2B-
>>         Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast 
>> >TAbort- <TAbort- <MAbort- >SERR- <PERR+
>
>
> PERR+ set... not good - this certainly will cause major issues
>
I know it sounds that way based on the definition (Detected parity error 
on hub side), however I have two other identical systems that have been 
running fine for months - with this same bit set - only they use an 
Adaptec ANA-64044 (4 port card - 10/100 fast ethernet - unfortunately 
discontinued). 

It seems the Pericom PCI to PCI bridge is having a problem talking to 
the LAN controllers behind the Hint bridge.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-17 18:29       ` William Montgomery
  2007-07-17 18:55         ` Kok, Auke
@ 2007-07-17 20:54         ` Krzysztof Halasa
  1 sibling, 0 replies; 14+ messages in thread
From: Krzysztof Halasa @ 2007-07-17 20:54 UTC (permalink / raw)
  To: William Montgomery; +Cc: linux-kernel

William Montgomery <william@opinicus.com> writes:

> I am able to analyze the primary bus while the using the card in the
> secondary and I see a very interesting thing on lockup - the primary
> side appears to be stuck on a read access to the memory mapped control
> regs of the LAN chip (82559) in what appears to be infinite target
> retries to the same address.  Unfortunately I havent been able to
> capture what occurs just prior to this happening.  This is quite
> different from what I capture on the secondary side; which is an idle
> bus

Seems like bridge problem, doesn't it?
I wonder if the infinite retry is the same register every time?
Could it be a deadlock generated by/in the bridge? I'd look at
the bridge specs and maybe updates, perhaps they have some hints.

> 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to
> PCI Bridge (rev 82) (prog-if 00 [Normal decode])
>        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR+
                           ^^^^^^
>        Bus: primary=00, secondary=01, subordinate=03, sec-latency=32

I wonder why PERR is set and which device on bus #0 or #1 causes it?

> 01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev
> 02) (prog-if 00 [Normal decode])
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR+ FastB2B-
>        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium
>>TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
>        Bus: primary=01, secondary=02, subordinate=03, sec-latency=32

It seems PERR on bus #0 isn't generated by this bridge, at least
it doesn't signal that in its status. Who knows, it may be unrelated.


Have you tried to perform the same tests on the other machine?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: e100 PCI bridge problem
  2007-07-17 18:55         ` Kok, Auke
  2007-07-17 19:37           ` William Montgomery
@ 2007-07-17 21:04           ` Krzysztof Halasa
  1 sibling, 0 replies; 14+ messages in thread
From: Krzysztof Halasa @ 2007-07-17 21:04 UTC (permalink / raw)
  To: Kok, Auke; +Cc: William Montgomery, linux-kernel

"Kok, Auke" <auke-jan.h.kok@intel.com> writes:

> PERR+ set... not good - this certainly will cause major issues

Unfortunately some devices assert PERR without a good reason,
and it may do no special harm.

Should be handled and cleared, probably.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-07-17 21:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13 17:37 e100 PCI bridge problem William Montgomery
2007-07-13 20:36 ` Kok, Auke
2007-07-13 22:30   ` William Montgomery
2007-07-13 22:41     ` Kok, Auke
2007-07-14  0:54       ` William Montgomery
2007-07-14 14:43 ` Krzysztof Halasa
2007-07-14 23:17   ` William Montgomery
2007-07-14 23:49     ` Krzysztof Halasa
2007-07-15  1:27       ` William Montgomery
2007-07-17 18:29       ` William Montgomery
2007-07-17 18:55         ` Kok, Auke
2007-07-17 19:37           ` William Montgomery
2007-07-17 21:04           ` Krzysztof Halasa
2007-07-17 20:54         ` Krzysztof Halasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox