* e100 PCI bridge problem @ 2007-07-13 17:37 William Montgomery 2007-07-13 20:36 ` Kok, Auke 2007-07-14 14:43 ` Krzysztof Halasa 0 siblings, 2 replies; 14+ messages in thread From: William Montgomery @ 2007-07-13 17:37 UTC (permalink / raw) To: linux-kernel In an earlier post to the list I described a hard lockup condition that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using a 4 port 10/100 fast ethernet card. The lockup is easily repeatable and occurs on 2 out of 3 computers. Further testing has revealed that the lockup can be prevented on all computers by making sure the card is installed on the primary PCI bus. If the card is installed in a slot on the secondary PCI bus (behind a PCI to PCI bridge) the lockup occurs. Are there any PCI tuning registers that I can tweak to get around this problem? Any changes I could make to the e100 driver to fix this? Any help appreciated. Regards, Wm ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-13 17:37 e100 PCI bridge problem William Montgomery @ 2007-07-13 20:36 ` Kok, Auke 2007-07-13 22:30 ` William Montgomery 2007-07-14 14:43 ` Krzysztof Halasa 1 sibling, 1 reply; 14+ messages in thread From: Kok, Auke @ 2007-07-13 20:36 UTC (permalink / raw) To: William Montgomery; +Cc: linux-kernel William Montgomery wrote: > In an earlier post to the list I described a hard lockup condition > that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using > a 4 port 10/100 fast ethernet card. The lockup is easily repeatable > and occurs on 2 out of 3 computers. > > Further testing has revealed that the lockup can be prevented on all > computers by making sure the card is installed on the primary PCI bus. > If the card is installed in a slot on the secondary PCI bus (behind a > PCI to PCI bridge) the lockup occurs. sounds like int-A/B/C/D routing issues > Are there any PCI tuning registers that I can tweak to get around > this problem? Any changes I could make to the e100 driver to fix this? this issue might be resolvable by quirking the bridgee chips and adjusting any APIC where needed. Unfortunately I don't know much about this but it's physically not possible from the e100 driver. The special (non-intel) card that has these 4 ports onboard contains a bridge chip itself which explains the issues. Even a BIOS issue could be the cause here. Perhaps the linuxfirmwarekit will reveal more information. In any case, fixing this in software would be a gigantic effort. Auke ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-13 20:36 ` Kok, Auke @ 2007-07-13 22:30 ` William Montgomery 2007-07-13 22:41 ` Kok, Auke 0 siblings, 1 reply; 14+ messages in thread From: William Montgomery @ 2007-07-13 22:30 UTC (permalink / raw) To: Kok, Auke; +Cc: linux-kernel Thanks for responding. I am very interested to find the source of this problem. Kok, Auke wrote: > William Montgomery wrote: > >> In an earlier post to the list I described a hard lockup condition >> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using >> a 4 port 10/100 fast ethernet card. The lockup is easily repeatable >> and occurs on 2 out of 3 computers. >> >> Further testing has revealed that the lockup can be prevented on all >> computers by making sure the card is installed on the primary PCI bus. >> If the card is installed in a slot on the secondary PCI bus (behind a >> PCI to PCI bridge) the lockup occurs. > > > sounds like int-A/B/C/D routing issues The strange thing is that all the ports on the card work fine for a few minutes, then when some condition (as yet unknown) occurs the system locks up hard. I am currently using a PCI bus analyzer to capture bus activity just prior to the lockup to try and find out what leads up to this condition. > >> Are there any PCI tuning registers that I can tweak to get around >> this problem? Any changes I could make to the e100 driver to fix this? > > > this issue might be resolvable by quirking the bridgee chips and > adjusting any APIC where needed. Unfortunately I don't know much about > this but it's physically not possible from the e100 driver. The > special (non-intel) card that has these 4 ports onboard contains a > bridge chip itself which explains the issues. Even a BIOS issue could > be the cause here. I am aware of the bridge chip on the card but not sure what you mean when you say this explains the issues? I sure would like to figure out a way around this. The PCI info follows: 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 03) 00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) 00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 02) 00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 82) 00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 02) 01:08.0 Ethernet controller: Intel Corp. 82801BD PRO/100 VE (CNR) Ethernet Controller (rev 82) 01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02) 02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 15) 03:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) 03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) 03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) 03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) > > Perhaps the linuxfirmwarekit will reveal more information. In any > case, fixing this in software would be a gigantic effort. > I will look into that on Monday and report what I find. It seems like it is premature to say how much effort the fix will take since the problem is not yet known? At least not known to me yet. I would just like to find out what parameters on the bridge/bridges might affect this problem and how to modify them. > Auke > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-13 22:30 ` William Montgomery @ 2007-07-13 22:41 ` Kok, Auke 2007-07-14 0:54 ` William Montgomery 0 siblings, 1 reply; 14+ messages in thread From: Kok, Auke @ 2007-07-13 22:41 UTC (permalink / raw) To: William Montgomery; +Cc: linux-kernel William Montgomery wrote: > Thanks for responding. I am very interested to find the source of this > problem. > > Kok, Auke wrote: > >> William Montgomery wrote: >> >>> In an earlier post to the list I described a hard lockup condition >>> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using >>> a 4 port 10/100 fast ethernet card. The lockup is easily repeatable >>> and occurs on 2 out of 3 computers. >>> >>> Further testing has revealed that the lockup can be prevented on all >>> computers by making sure the card is installed on the primary PCI bus. >>> If the card is installed in a slot on the secondary PCI bus (behind a >>> PCI to PCI bridge) the lockup occurs. >> >> sounds like int-A/B/C/D routing issues > > The strange thing is that all the ports on the card work fine for a few > minutes, then when some condition (as yet unknown) occurs the system > locks up hard. I am currently using a PCI bus analyzer to capture bus > activity just prior to the lockup to try and find out what leads up to > this condition. are you running any form of irqbalance, either in-kernel (bad) or the userspace (better) one? >>> Are there any PCI tuning registers that I can tweak to get around >>> this problem? Any changes I could make to the e100 driver to fix this? >> >> this issue might be resolvable by quirking the bridgee chips and >> adjusting any APIC where needed. Unfortunately I don't know much about >> this but it's physically not possible from the e100 driver. The >> special (non-intel) card that has these 4 ports onboard contains a >> bridge chip itself which explains the issues. Even a BIOS issue could >> be the cause here. > > I am aware of the bridge chip on the card but not sure what you mean > when you say this explains the issues? I sure would like to figure out > a way around this. irq routing in linux may not be the same as in windows. I have no idea how to compare them either (dmesg will show the linux setup, but I don't know how to retreive this info under linux). > The PCI info follows: > 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM > Controller/Host-Hub Interface (rev 03) > 00:02.0 VGA compatible controller: Intel Corp. > 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) > 00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 02) > 00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 02) > 00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 02) > 00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI > Controller (rev 02) > 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to > PCI Bridge (rev 82) > 00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 02) > 00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 > Storage Controller (rev 02) > 00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 02) > 01:08.0 Ethernet controller: Intel Corp. 82801BD PRO/100 VE (CNR) > Ethernet Controller (rev 82) > 01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02) > 02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge > (non-transparent mode) (rev 15) > 03:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] > (rev 08) > 03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] > (rev 08) > 03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] > (rev 08) > 03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] > (rev 08) > >> Perhaps the linuxfirmwarekit will reveal more information. In any >> case, fixing this in software would be a gigantic effort. >> > I will look into that on Monday and report what I find. It seems like > it is premature to say how much effort the fix will take since the > problem is not yet known? At least not known to me yet. I would just > like to find out what parameters on the bridge/bridges might affect this > problem and how to modify them. I personally have no idea and am not knowledgeable enough on this issue, sorry :) Auke > >> Auke >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-13 22:41 ` Kok, Auke @ 2007-07-14 0:54 ` William Montgomery 0 siblings, 0 replies; 14+ messages in thread From: William Montgomery @ 2007-07-14 0:54 UTC (permalink / raw) To: Kok, Auke; +Cc: linux-kernel Kok, Auke wrote: > William Montgomery wrote: > >> Thanks for responding. I am very interested to find the source of >> this problem. >> >> Kok, Auke wrote: >> >>> William Montgomery wrote: >>> >>>> In an earlier post to the list I described a hard lockup condition >>>> that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using >>>> a 4 port 10/100 fast ethernet card. The lockup is easily repeatable >>>> and occurs on 2 out of 3 computers. >>>> >>>> Further testing has revealed that the lockup can be prevented on all >>>> computers by making sure the card is installed on the primary PCI bus. >>>> If the card is installed in a slot on the secondary PCI bus (behind a >>>> PCI to PCI bridge) the lockup occurs. >>> >>> >>> sounds like int-A/B/C/D routing issues >> >> >> The strange thing is that all the ports on the card work fine for a >> few minutes, then when some condition (as yet unknown) occurs the >> system locks up hard. I am currently using a PCI bus analyzer to >> capture bus activity just prior to the lockup to try and find out >> what leads up to this condition. > > > are you running any form of irqbalance, either in-kernel (bad) or the > userspace (better) one? No. This is a Pentium 4 - single core, 2.8GHz. > >>>> Are there any PCI tuning registers that I can tweak to get around >>>> this problem? Any changes I could make to the e100 driver to fix >>>> this? >>> >>> >>> this issue might be resolvable by quirking the bridgee chips and >>> adjusting any APIC where needed. Unfortunately I don't know much >>> about this but it's physically not possible from the e100 driver. >>> The special (non-intel) card that has these 4 ports onboard contains >>> a bridge chip itself which explains the issues. Even a BIOS issue >>> could be the cause here. >> >> >> I am aware of the bridge chip on the card but not sure what you mean >> when you say this explains the issues? I sure would like to figure >> out a way around this. > > > irq routing in linux may not be the same as in windows. I have no idea > how to compare them either (dmesg will show the linux setup, but I > don't know how to retreive this info under linux). Not sure how windows applies here; I only use Linux. The main data point so far is that the card works fine when on the primary PCI bus but locks up hard after a few minutes when installed in a slot behind a PCI to PCI bridge. I can provide the dmesg info on Monday. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-13 17:37 e100 PCI bridge problem William Montgomery 2007-07-13 20:36 ` Kok, Auke @ 2007-07-14 14:43 ` Krzysztof Halasa 2007-07-14 23:17 ` William Montgomery 1 sibling, 1 reply; 14+ messages in thread From: Krzysztof Halasa @ 2007-07-14 14:43 UTC (permalink / raw) To: William Montgomery; +Cc: linux-kernel William Montgomery <william@opinicus.com> writes: > In an earlier post to the list I described a hard lockup condition > that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using > a 4 port 10/100 fast ethernet card. The lockup is easily repeatable > and occurs on 2 out of 3 computers. > > Further testing has revealed that the lockup can be prevented on all > computers by making sure the card is installed on the primary PCI bus. > If the card is installed in a slot on the secondary PCI bus (behind a > PCI to PCI bridge) the lockup occurs. Does the machine #3 have a PCI slot connected to a "secondary" bus? Have you tried with any other machine with a secondary bus? > Are there any PCI tuning registers that I can tweak to get around > this problem? Any changes I could make to the e100 driver to fix this? Could be a hardware/BIOS problem on machines #1 and #2. Could be a Linux bug as well, though similar configurations are known to work fine. I don't think it has anything to do with IRQs. Perhaps it doesn't like a bridge (on the card) behind a bridge (on the motherboard). I would test with another multiport card such as old DLink DFE-570TX (using a DEC 21150 bridge and four 21143 Ethernet chips). I'd probably use some PCI analyzer or, at least, I'd check the bus state with a multimeter. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-14 14:43 ` Krzysztof Halasa @ 2007-07-14 23:17 ` William Montgomery 2007-07-14 23:49 ` Krzysztof Halasa 0 siblings, 1 reply; 14+ messages in thread From: William Montgomery @ 2007-07-14 23:17 UTC (permalink / raw) To: Krzysztof Halasa; +Cc: linux-kernel Krzysztof Halasa wrote: >William Montgomery <william@opinicus.com> writes: > > > >>In an earlier post to the list I described a hard lockup condition >>that occurs on linux kernels 2.4.22, 2.6.13, and 2.6.17 when using >>a 4 port 10/100 fast ethernet card. The lockup is easily repeatable >>and occurs on 2 out of 3 computers. >> >>Further testing has revealed that the lockup can be prevented on all >>computers by making sure the card is installed on the primary PCI bus. >>If the card is installed in a slot on the secondary PCI bus (behind a >>PCI to PCI bridge) the lockup occurs. >> >> > >Does the machine #3 have a PCI slot connected to a "secondary" bus? >Have you tried with any other machine with a secondary bus? > > > The #3 machine doesn't have a secondary bus. #1 and #2 are from 2 different vendors (#1 Advantech - #2 Axiomtek) and I havent tried any othe machines. >>Are there any PCI tuning registers that I can tweak to get around >>this problem? Any changes I could make to the e100 driver to fix this? >> >> > >Could be a hardware/BIOS problem on machines #1 and #2. Could be >a Linux bug as well, though similar configurations are known to work >fine. I don't think it has anything to do with IRQs. > >Perhaps it doesn't like a bridge (on the card) behind a bridge >(on the motherboard). I would test with another multiport card >such as old DLink DFE-570TX (using a DEC 21150 bridge and four >21143 Ethernet chips). > >I'd probably use some PCI analyzer or, at least, I'd check >the bus state with a multimeter. > > The #1 and #2 machines are known to work with an older Adaptec ANA-62044 4port NIC (tulip based) with an onboard Intel 21154 bridge chip. The card I am having problems with uses an onboard Hint Corp HB6 Universal PCI-PCI bridge. I am using a PCI analyzer and it shows the bus in an idle state after the lockup. The PCI transactions just prior to the lockup show a couple of interrupts from the card which appear to be handled correctly. Anything I should be looking for in particular? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-14 23:17 ` William Montgomery @ 2007-07-14 23:49 ` Krzysztof Halasa 2007-07-15 1:27 ` William Montgomery 2007-07-17 18:29 ` William Montgomery 0 siblings, 2 replies; 14+ messages in thread From: Krzysztof Halasa @ 2007-07-14 23:49 UTC (permalink / raw) To: William Montgomery; +Cc: linux-kernel William Montgomery <william@opinicus.com> writes: > I am using a PCI analyzer and it shows the bus in an idle state after > the lockup. The PCI transactions just prior to the lockup show a > couple of interrupts from the card which appear to be handled > correctly. Anything I should be looking for in particular? I'd try to check with other machine using "secondary" bus slot. BTW: Are you able to analyze the "primary" bus transactions while using the card in "secondary" bus? Perhaps there is something wrong in front of the motherboard bridge? A broken motherboard may be hard to diagnose, unfortunately. Can you post something like "lspci -vv" taken on both machines? -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-14 23:49 ` Krzysztof Halasa @ 2007-07-15 1:27 ` William Montgomery 2007-07-17 18:29 ` William Montgomery 1 sibling, 0 replies; 14+ messages in thread From: William Montgomery @ 2007-07-15 1:27 UTC (permalink / raw) To: Krzysztof Halasa; +Cc: linux-kernel Krzysztof Halasa wrote: >William Montgomery <william@opinicus.com> writes: > > > >>I am using a PCI analyzer and it shows the bus in an idle state after >>the lockup. The PCI transactions just prior to the lockup show a >>couple of interrupts from the card which appear to be handled >>correctly. Anything I should be looking for in particular? >> >> > >I'd try to check with other machine using "secondary" bus slot. >BTW: Are you able to analyze the "primary" bus transactions while >using the card in "secondary" bus? Perhaps there is something >wrong in front of the motherboard bridge? > >A broken motherboard may be hard to diagnose, unfortunately. > >Can you post something like "lspci -vv" taken on both machines? > > I will post more info on Monday when I am able to power them up. I'm not so sure the motherboard is broken, I am leaning more towards a misconfigured bridge. This computer is a 4U 19 inch rackmount chassis with a PCMIG CPU and a 12 slot PCI backplane. I have done a lot of testing with this box trying to characterize this problem. In one case I have put 3 Intel PRO 100S NICs on the secondary PCI bus and they ran under heavy stress test loads overnight. The 4 port NIC seems to be the only card that doesnt want to cooperate. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-14 23:49 ` Krzysztof Halasa 2007-07-15 1:27 ` William Montgomery @ 2007-07-17 18:29 ` William Montgomery 2007-07-17 18:55 ` Kok, Auke 2007-07-17 20:54 ` Krzysztof Halasa 1 sibling, 2 replies; 14+ messages in thread From: William Montgomery @ 2007-07-17 18:29 UTC (permalink / raw) To: Krzysztof Halasa; +Cc: linux-kernel Krzysztof Halasa wrote: >William Montgomery <william@opinicus.com> writes: > > > >>I am using a PCI analyzer and it shows the bus in an idle state after >>the lockup. The PCI transactions just prior to the lockup show a >>couple of interrupts from the card which appear to be handled >>correctly. Anything I should be looking for in particular? >> >> > >I'd try to check with other machine using "secondary" bus slot. >BTW: Are you able to analyze the "primary" bus transactions while >using the card in "secondary" bus? Perhaps there is something >wrong in front of the motherboard bridge? > > > I am able to analyze the primary bus while the using the card in the secondary and I see a very interesting thing on lockup - the primary side appears to be stuck on a read access to the memory mapped control regs of the LAN chip (82559) in what appears to be infinite target retries to the same address. Unfortunately I havent been able to capture what occurs just prior to this happening. This is quite different from what I capture on the secondary side; which is an idle bus I have posted the lspci -vv listing below... >A broken motherboard may be hard to diagnose, unfortunately. > >Can you post something like "lspci -vv" taken on both machines? > > Here is the lspci -vv on the machine with lockups (edited for brevity): 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Ho Subsystem: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- Latency: 0 Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M] Capabilities: [e4] #09 [1105] 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 82) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR+ Latency: 0 Bus: primary=00, secondary=01, subordinate=03, sec-latency=32 I/O behind bridge: 00009000-0000afff Memory behind bridge: f4000000-f6ffffff Prefetchable memory behind bridge: 10000000-103fffff BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- 01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, cache line size 08 Bus: primary=01, secondary=02, subordinate=03, sec-latency=32 I/O behind bridge: 00009000-00009fff Memory behind bridge: f4000000-f5ffffff Prefetchable memory behind bridge: 0000000010000000-0000000010300000 BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [dc] Power Management version 1 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [b0] Slot ID: 0 slots, First-, chassis 00 02:06.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 15) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, cache line size 08 Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 I/O behind bridge: 00009000-00009fff Memory behind bridge: f4000000-f5ffffff Prefetchable memory behind bridge: 0000000010000000-0000000010300000 BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] #06 [0080] Capabilities: [a0] Vital Product Data03:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter with Alert On LAN* Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 18 Region 0: Memory at f5403000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at 9000 [size=64] Region 2: Memory at f5000000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 10000000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 03:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter with Alert On LAN* Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 19 Region 0: Memory at f5401000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at 9400 [size=64] Region 2: Memory at f5100000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 10100000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 03:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter with Alert On LAN* Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 16 Region 0: Memory at f5400000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at 9800 [size=64] Region 2: Memory at f5200000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 10200000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 03:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter with Alert On LAN* Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 17 Region 0: Memory at f5402000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at 9c00 [size=64] Region 2: Memory at f5300000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 10300000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- ====================== ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-17 18:29 ` William Montgomery @ 2007-07-17 18:55 ` Kok, Auke 2007-07-17 19:37 ` William Montgomery 2007-07-17 21:04 ` Krzysztof Halasa 2007-07-17 20:54 ` Krzysztof Halasa 1 sibling, 2 replies; 14+ messages in thread From: Kok, Auke @ 2007-07-17 18:55 UTC (permalink / raw) To: William Montgomery; +Cc: Krzysztof Halasa, linux-kernel William Montgomery wrote: > Krzysztof Halasa wrote: > >> William Montgomery <william@opinicus.com> writes: >> >> >> >>> I am using a PCI analyzer and it shows the bus in an idle state after >>> the lockup. The PCI transactions just prior to the lockup show a >>> couple of interrupts from the card which appear to be handled >>> correctly. Anything I should be looking for in particular? >>> >>> >> I'd try to check with other machine using "secondary" bus slot. >> BTW: Are you able to analyze the "primary" bus transactions while >> using the card in "secondary" bus? Perhaps there is something >> wrong in front of the motherboard bridge? >> >> >> > I am able to analyze the primary bus while the using the card in the > secondary and I see a very interesting thing on lockup - the primary > side appears to be stuck on a read access to the memory mapped control > regs of the LAN chip (82559) in what appears to be infinite target > retries to the same address. Unfortunately I havent been able to > capture what occurs just prior to this happening. This is quite > different from what I capture on the secondary side; which is an idle bus > > I have posted the lspci -vv listing below... > >> A broken motherboard may be hard to diagnose, unfortunately. >> >> Can you post something like "lspci -vv" taken on both machines? >> >> > Here is the lspci -vv on the machine with lockups (edited for brevity): > > 00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM > Controller/Ho > Subsystem: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM > Controller/Host > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Step > Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- > <TAbort- > Latency: 0 > Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M] > Capabilities: [e4] #09 [1105] > 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI > Bridge (rev 82) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR+ PERR+ set... not good - this certainly will cause major issues Auke ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-17 18:55 ` Kok, Auke @ 2007-07-17 19:37 ` William Montgomery 2007-07-17 21:04 ` Krzysztof Halasa 1 sibling, 0 replies; 14+ messages in thread From: William Montgomery @ 2007-07-17 19:37 UTC (permalink / raw) To: Kok, Auke; +Cc: Krzysztof Halasa, linux-kernel Kok, Auke wrote: >> 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to >> PCI Bridge (rev 82) (prog-if 00 [Normal decode]) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR+ FastB2B- >> Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >> >TAbort- <TAbort- <MAbort- >SERR- <PERR+ > > > PERR+ set... not good - this certainly will cause major issues > I know it sounds that way based on the definition (Detected parity error on hub side), however I have two other identical systems that have been running fine for months - with this same bit set - only they use an Adaptec ANA-64044 (4 port card - 10/100 fast ethernet - unfortunately discontinued). It seems the Pericom PCI to PCI bridge is having a problem talking to the LAN controllers behind the Hint bridge. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-17 18:55 ` Kok, Auke 2007-07-17 19:37 ` William Montgomery @ 2007-07-17 21:04 ` Krzysztof Halasa 1 sibling, 0 replies; 14+ messages in thread From: Krzysztof Halasa @ 2007-07-17 21:04 UTC (permalink / raw) To: Kok, Auke; +Cc: William Montgomery, linux-kernel "Kok, Auke" <auke-jan.h.kok@intel.com> writes: > PERR+ set... not good - this certainly will cause major issues Unfortunately some devices assert PERR without a good reason, and it may do no special harm. Should be handled and cleared, probably. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: e100 PCI bridge problem 2007-07-17 18:29 ` William Montgomery 2007-07-17 18:55 ` Kok, Auke @ 2007-07-17 20:54 ` Krzysztof Halasa 1 sibling, 0 replies; 14+ messages in thread From: Krzysztof Halasa @ 2007-07-17 20:54 UTC (permalink / raw) To: William Montgomery; +Cc: linux-kernel William Montgomery <william@opinicus.com> writes: > I am able to analyze the primary bus while the using the card in the > secondary and I see a very interesting thing on lockup - the primary > side appears to be stuck on a read access to the memory mapped control > regs of the LAN chip (82559) in what appears to be infinite target > retries to the same address. Unfortunately I havent been able to > capture what occurs just prior to this happening. This is quite > different from what I capture on the secondary side; which is an idle > bus Seems like bridge problem, doesn't it? I wonder if the infinite retry is the same register every time? Could it be a deadlock generated by/in the bridge? I'd look at the bridge specs and maybe updates, perhaps they have some hints. > 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to > PCI Bridge (rev 82) (prog-if 00 [Normal decode]) > Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR+ ^^^^^^ > Bus: primary=00, secondary=01, subordinate=03, sec-latency=32 I wonder why PERR is set and which device on bus #0 or #1 causes it? > 01:0c.0 PCI bridge: Pericom Semiconductor: Unknown device 8150 (rev > 02) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >>TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Bus: primary=01, secondary=02, subordinate=03, sec-latency=32 It seems PERR on bus #0 isn't generated by this bridge, at least it doesn't signal that in its status. Who knows, it may be unrelated. Have you tried to perform the same tests on the other machine? -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-07-17 21:04 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-13 17:37 e100 PCI bridge problem William Montgomery 2007-07-13 20:36 ` Kok, Auke 2007-07-13 22:30 ` William Montgomery 2007-07-13 22:41 ` Kok, Auke 2007-07-14 0:54 ` William Montgomery 2007-07-14 14:43 ` Krzysztof Halasa 2007-07-14 23:17 ` William Montgomery 2007-07-14 23:49 ` Krzysztof Halasa 2007-07-15 1:27 ` William Montgomery 2007-07-17 18:29 ` William Montgomery 2007-07-17 18:55 ` Kok, Auke 2007-07-17 19:37 ` William Montgomery 2007-07-17 21:04 ` Krzysztof Halasa 2007-07-17 20:54 ` Krzysztof Halasa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox