* Re: eth*: transmit timed out since .27 (was: linux-2.4.27 released) [not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com> @ 2004-08-16 17:52 ` Len Brown 2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler 0 siblings, 1 reply; 9+ messages in thread From: Len Brown @ 2004-08-16 17:52 UTC (permalink / raw) To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel Oliver, I'm glad that turning off "pci=noacpi" fixed your system. I don't know why the legacy irqrouter didn't work, but as ACPI works, I'm not going to worry about it;-) I expect the "acpi=off" experiment would behave the same as "pci=noacpi", but it looks like in your experiment you mis-spelled that parameter as apci=off, so instead it was the same as the default ACPI-enabled case. Re: lots of interrupts on the same IRQ. There are boot params to balance out the IRQs in PIC mode, but what you want to do on this system is enable the IOAPIC in your kernel config. The existence of the MADT in your ACPI tables suggests you may have one. An IOAPIC will bring additional interrupt pins to bear, usually allowing the PCI interrupts to use IRQs > 16 where they may not have to share so much. cheers, -Len ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown @ 2004-08-16 18:44 ` Oliver Feiler 2004-08-16 19:08 ` Oliver Feiler 2004-08-16 19:38 ` Len Brown 0 siblings, 2 replies; 9+ messages in thread From: Oliver Feiler @ 2004-08-16 18:44 UTC (permalink / raw) To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel Hello Len, Len Brown wrote: > Oliver, > I'm glad that turning off "pci=noacpi" fixed your system. > I don't know why the legacy irqrouter didn't work, but > as ACPI works, I'm not going to worry about it;-) Well, it did work with 2.4.26, but I agree that it's better to get the new stuff to work correctly. ;) I just noticed that /proc/interrupts and /proc/pci, lspci still disagree on the IRQ of the IDE device. CPU0 0: 112337 IO-APIC-edge timer 1: 2 IO-APIC-edge keyboard 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 9296 IO-APIC-edge ide0 15: 9078 IO-APIC-edge ide1 17: 24 IO-APIC-level eth1 18: 125085 IO-APIC-level eth0 21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci 22: 0 IO-APIC-level via82cxxx 23: 2976 IO-APIC-level eth2 NMI: 0 LOC: 112313 ERR: 0 MIS: 42 vs. 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: Unknown device 1849:0571 Flags: bus master, medium devsel, latency 32, IRQ 255 I/O ports at fc00 [size=16] Capabilities: <available only to root> This probably has to do with this boot message: PCI: No IRQ known for interrupt pin A of device 00:11.1 I have found absolutely nothing that explains if this is an error or just some sort of debug message one can ignore. > > I expect the "acpi=off" experiment would behave the same as > "pci=noacpi", but it looks like in your experiment you > mis-spelled that parameter as apci=off, so instead it was the > same as the default ACPI-enabled case. Oh, thanks for noticing. Stupid me. > > Re: lots of interrupts on the same IRQ. > There are boot params to balance out the IRQs in PIC mode, > but what you want to do on this system is enable the IOAPIC > in your kernel config. The existence of the MADT in your > ACPI tables suggests you may have one. An IOAPIC will bring > additional interrupt pins to bear, usually allowing > the PCI interrupts to use IRQs > 16 where they may > not have to share so much. Ok, I've turned on the IOAPIC and it seems to work perfectly fine. Except for that IRQ 255 thing I've noticed no oddities. Thanks for the hint. :) cu Oliver ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler @ 2004-08-16 19:08 ` Oliver Feiler 2004-08-16 19:50 ` Len Brown 2004-08-16 19:38 ` Len Brown 1 sibling, 1 reply; 9+ messages in thread From: Oliver Feiler @ 2004-08-16 19:08 UTC (permalink / raw) To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1083 bytes --] Oliver Feiler wrote: > > > Ok, I've turned on the IOAPIC and it seems to work perfectly fine. > Except for that IRQ 255 thing I've noticed no oddities. Thanks for the > hint. :) No, not quite. After about 30 minutes of uptime and a moderate load of eth0 (100-200KB/s constant data flow) it happened again. :( Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x97, t=36. Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=141. Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=545. [repeating endlessly] I've booted a kernel without APIC and IOAPIC compiled and it works again. I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really know where to look for the problem here. The interrupt counter for the IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly after a while. And after a while is seems to get stuck (Tx timed out). "ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to fix such network problems. cu Oliver [-- Attachment #2: dmesg-2.4.27-ioapic.gz --] [-- Type: application/x-gzip, Size: 4878 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 19:08 ` Oliver Feiler @ 2004-08-16 19:50 ` Len Brown 2004-08-16 23:04 ` Oliver Feiler 0 siblings, 1 reply; 9+ messages in thread From: Len Brown @ 2004-08-16 19:50 UTC (permalink / raw) To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel On Mon, 2004-08-16 at 15:08, Oliver Feiler wrote: > Oliver Feiler wrote: > > > > > > Ok, I've turned on the IOAPIC and it seems to work perfectly fine. > > Except for that IRQ 255 thing I've noticed no oddities. Thanks for > the > > hint. :) > > No, not quite. After about 30 minutes of uptime and a moderate load of > eth0 (100-200KB/s constant data flow) it happened again. :( > > Aug 16 21:03:13 spot kernel: eth0: Tx timed out, lost interrupt? > TSR=0x3, ISR=0x97, t=36. > Aug 16 21:03:15 spot kernel: eth0: Tx timed out, lost interrupt? > TSR=0x3, ISR=0x3, t=141. > Aug 16 21:03:23 spot kernel: eth0: Tx timed out, lost interrupt? > TSR=0x3, ISR=0x3, t=545. > [repeating endlessly] > > I've booted a kernel without APIC and IOAPIC compiled and it works > again. > > I'm attaching a dmesg from a boot with IOAPIC enabled. I don't really > know where to look for the problem here. The interrupt counter for the > IRQ eth0 is using (a Realtek 8029 chipset) is growing significantly > after a while. And after a while is seems to get stuck (Tx timed out). > "ifconfig eth0 down" and "up" again did nothing. Sometimes it seems to > fix such network problems. You've got 3 ethernet controllers. eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA. eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6. eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23. And eth0 is failing. See if you can give its network cable and its IRQ to on of the other devices and see if the error follows the load and the wires, or stays with the device. The quirks for this hardware look totally broken in IOAPIC mode: PCI: Via IRQ fixup for 00:10.2, from 10 to 5 PCI: Via IRQ fixup for 00:10.1, from 10 to 5 PCI: Via IRQ fixup for 00:10.0, from 11 to 5 I have no idea if they're a nop or not, but you might exeriment with disabling them. Sure isn't obvious that something called quirk_via_irqpic() should be running in IOAPIC mode. I'd try disabling quirk_via_acpi() too. cheers, -Len ps. to exchange IRQs, you'll need to physically exchange the slots of the cards, easy enough unless eth0 is soldered onto the motherboard;-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 19:50 ` Len Brown @ 2004-08-16 23:04 ` Oliver Feiler 2004-08-16 23:42 ` Maciej W. Rozycki 2004-08-17 0:29 ` Alan Cox 0 siblings, 2 replies; 9+ messages in thread From: Oliver Feiler @ 2004-08-16 23:04 UTC (permalink / raw) To: Len Brown; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel Hi Len, Len Brown wrote: > > > You've got 3 ethernet controllers. > > eth0: RealTek RTL-8029 found at 0xe800, IRQ 18, 00:00:E8:5C:2D:AA. > eth1: SiS 900 PCI Fast Ethernet at 0xec00, IRQ 17, 00:c0:ca:16:4c:b6. > eth2: VIA VT6102 Rhine-II at 0xd400, 00:0b:6a:2b:48:84, IRQ 23. Correct. > > And eth0 is failing. > See if you can give its network cable and its IRQ to on of the other > devices and see if the error follows the load and the wires, > or stays with the device. Doing that is a bit problematic. eth0 is a 10mbit NIC, eth1 and eth2 must be 100mbit unfortunately. I can move around (two of) the NICs in the PCI slots however. The box is headless and a bit uncomfortable to work with, so I'd like to try software solutions first. > > The quirks for this hardware look totally broken in IOAPIC mode: > PCI: Via IRQ fixup for 00:10.2, from 10 to 5 > PCI: Via IRQ fixup for 00:10.1, from 10 to 5 > PCI: Via IRQ fixup for 00:10.0, from 11 to 5 > I have no idea if they're a nop or not, but you might exeriment with > disabling them. Sure isn't obvious that something called > quirk_via_irqpic() should be running in IOAPIC mode. > I'd try disabling quirk_via_acpi() too. Ok, I've removed the quirks from quirks.c, compiled and rebooted. I hope I have done it right, I commented out these lines in quirks.c: // { PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_3, quirk_via_acpi }, // { PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_4, quirk_via_acpi }, // { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic }, // { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic }, // { PCI_FIXUP_FINAL, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic }, The "Via IRQ fixup for dev:..." are gone from the boot messages. After transferring about 250 MB over eth0 the "Tx timed out" error reoccured. /proc/interrupts looked like this: CPU0 0: 191473 IO-APIC-edge timer 1: 1244 IO-APIC-edge keyboard 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 33547 IO-APIC-edge ide0 15: 23121 IO-APIC-edge ide1 17: 5699 IO-APIC-level eth1 18: 234589 IO-APIC-level eth0 21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci 22: 0 IO-APIC-level via82cxxx 23: 240873 IO-APIC-level eth2 NMI: 0 LOC: 191481 ERR: 0 MIS: 8 What exactly is MIS? Something like "interrupt occured, but I have no idea what device caused it"? I don't know much about it, but it's always >0 when the problem happens. > > cheers, > -Len > > ps. to exchange IRQs, you'll need to physically exchange the slots > of the cards, easy enough unless eth0 is soldered onto the > motherboard;-) Fortunately only eth2 (the VIA Rhine-II) is soldered onto the board. :) I'll try reordering the NICs in the PCI slots. The system is used most of the time though, so I can't take it apart and test things all the time. I wonder if it makes sense to experiment with the IOAPIC further. Maybe the hardware is just plain broken? Or might there be a slight chance to get this to work the way it's intended to? Btw, I don't know if I've ever mentioned it, it's an Asrock K7VM4 board. lspci output is here if it might be of interest: kiza@spot:~> lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8378 [KM400] Chipset Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge 00:09.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 02) 00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) 00:10.0 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 80) 00:10.1 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 80) 00:10.2 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0 controller] (rev 80) 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06) 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) 01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8378 [S3 UniChrome] Integrated Video (rev 01) Thanks for your help with this. :) Oliver ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 23:04 ` Oliver Feiler @ 2004-08-16 23:42 ` Maciej W. Rozycki 2004-08-17 0:29 ` Alan Cox 1 sibling, 0 replies; 9+ messages in thread From: Maciej W. Rozycki @ 2004-08-16 23:42 UTC (permalink / raw) To: Oliver Feiler; +Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti, linux-kernel On Tue, 17 Aug 2004, Oliver Feiler wrote: > MIS: 8 > > What exactly is MIS? Something like "interrupt occured, but I have no > idea what device caused it"? I don't know much about it, but it's always > >0 when the problem happens. It's a trigger mode MISmatch. It only happens for level-triggered interrupts and the problem is they get recorded as edge-triggered ones in the receiving local APIC. The two interrupt trigger modes require the hardware to perform different actions when the software interrupt handler concludes and such a mismatch would lead to a lock-up of the affected line. Specifically, the local APIC involved sends an End Of Interrupt (EOI) message to the originating I/O APIC for level-triggered interrupts and for edge-triggered interrupts nothing is sent. Fortunately just before sending the final ACK to the hardware at the conclusion of the handler we can detect that the trigger mode recorded by the local APIC disagrees with the setup of the corresponding I/O APIC line and if that happens we execute an (expensive) unlock action at the I/O APIC so that it resets its logic for the input as if it received an EOI message from a local APIC for a level-triggered interrupt. Maciej ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 23:04 ` Oliver Feiler 2004-08-16 23:42 ` Maciej W. Rozycki @ 2004-08-17 0:29 ` Alan Cox 1 sibling, 0 replies; 9+ messages in thread From: Alan Cox @ 2004-08-17 0:29 UTC (permalink / raw) To: Oliver Feiler Cc: Len Brown, Marcelo Tosatti, Marcelo Tosatti, Linux Kernel Mailing List Looking over the docs the whole ACPI and IOAPIC mode for these boards seems very different and quite "magic" compared to the PCI mode which is merely "odd" in a few places. APIC routing bits are stuffed into strange chipset specific places which implies the quirks probably shouldn't be applied in acpi mode. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler 2004-08-16 19:08 ` Oliver Feiler @ 2004-08-16 19:38 ` Len Brown 2004-08-16 20:11 ` Maciej W. Rozycki 1 sibling, 1 reply; 9+ messages in thread From: Len Brown @ 2004-08-16 19:38 UTC (permalink / raw) To: Oliver Feiler; +Cc: Marcelo Tosatti, Marcelo Tosatti, linux-kernel On Mon, 2004-08-16 at 14:44, Oliver Feiler wrote: > 14: 9296 IO-APIC-edge ide0 > 15: 9078 IO-APIC-edge ide1 > 17: 24 IO-APIC-level eth1 > 18: 125085 IO-APIC-level eth0 > 21: 0 IO-APIC-level usb-uhci, usb-uhci, usb-uhci > 22: 0 IO-APIC-level via82cxxx > 23: 2976 IO-APIC-level eth2 > NMI: 0 > LOC: 112313 > ERR: 0 > MIS: 42 This is unusual. MIS is a hardware workaround and should normally be 0. > > > vs. > > 00:11.1 IDE interface: VIA Technologies, Inc. > VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev > 06) > (prog-if 8a [Master SecP PriP]) > Subsystem: Unknown device 1849:0571 > Flags: bus master, medium devsel, latency 32, IRQ 255 > I/O ports at fc00 [size=16] > Capabilities: <available only to root> > > This probably has to do with this boot message: > PCI: No IRQ known for interrupt pin A of device 00:11.1 > I have found absolutely nothing that explains if this is an error or > just some sort of debug message one can ignore. Yes, ignore it. This is where that message about 255 came from. When ACPI failed to find a PCI-routing-table entry for this device, it looked in PCI config space and found the 255 you see above. The only recent change is that it dosn't try to use an obviously bogus value. But in either case, with this device it is moot as the hardware and the driver are hard-coded. -Len ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: eth*: transmit timed out since .27 2004-08-16 19:38 ` Len Brown @ 2004-08-16 20:11 ` Maciej W. Rozycki 0 siblings, 0 replies; 9+ messages in thread From: Maciej W. Rozycki @ 2004-08-16 20:11 UTC (permalink / raw) To: Len Brown; +Cc: Oliver Feiler, Marcelo Tosatti, Marcelo Tosatti, linux-kernel On Mon, 16 Aug 2004, Len Brown wrote: > > MIS: 42 > > This is unusual. > MIS is a hardware workaround and should normally be 0. Unfortunately these events seem to be triggerable for all systems using serial APIC interrupt delivery. All that is needed is a sufficiently high load on interrupts, even a transient one. Admittedly the definition of "sufficient" here is very high, something like at least ten thousands of interrupts per second. E.g. I've been able to observe a few of them on my system when a UDP NFS client was untarring an archive over a 100Mbps network -- both the archive and the destination were located in an NFS mounted filesystem and the size of the untarred data was around 300MB. The APIC hardware is rock-solid there -- after many years of operation I have yet to see a single APIC error. One "reliable" way of triggering these events is configuring the PIT timer interrupt input as level-triggered in the I/O APIC. ;-) This is actually how I did run-time testing of this code. Maciej ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-17 1:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <566B962EB122634D86E6EE29E83DD808182C3236@hdsmsx403.hd.intel.com>
2004-08-16 17:52 ` eth*: transmit timed out since .27 (was: linux-2.4.27 released) Len Brown
2004-08-16 18:44 ` eth*: transmit timed out since .27 Oliver Feiler
2004-08-16 19:08 ` Oliver Feiler
2004-08-16 19:50 ` Len Brown
2004-08-16 23:04 ` Oliver Feiler
2004-08-16 23:42 ` Maciej W. Rozycki
2004-08-17 0:29 ` Alan Cox
2004-08-16 19:38 ` Len Brown
2004-08-16 20:11 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox