* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
@ 2001-01-12 17:16 Manfred Spraul
2001-01-12 17:33 ` Frank de Lange
2001-01-12 17:49 ` Alan Cox
0 siblings, 2 replies; 90+ messages in thread
From: Manfred Spraul @ 2001-01-12 17:16 UTC (permalink / raw)
To: dwmw2, linux-kernel, frank
>
> manfred@colorfullife.com said:
> > IRR for interrupt 19 is set, that means the IO APIC has sent the
> > interrupt to a cpu but not yet received the corresponding EOI.
>
> OK, but couldn't we reset it by sending an extra EOI when the drivers
> decide that they've missed interrupts?
How?
You send an EOI by writing 0 to the EOI register of the local apic, and
then the local apic automagically checks it's ISR bitfield.
It takes the highest set bit and clears it. Then it checks that bit in
the TMR, and it if's also set in the TMR then it sends an EOI to the IO
apic.
The magic seems to be tamper proof: all bits are read only.
The bit on the IO apic is also read only.
Perhaps with brute force? Switch the interrupt to edge triggered on the
io apic, wait 1 usec, switch it back to level triggered. The IRR bit is
undefined for edge triggered interrupts, perhaps that clears the IRR
bit.
I would first concentrate on the differences between 2.2 and 2.4:
Frank, could you try what happens with the NMI oopser disabled?
The second major difference I'm immediately aware of is the number of
the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority,
2.4 the highest priority.
--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 90+ messages in thread* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 17:16 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Manfred Spraul @ 2001-01-12 17:33 ` Frank de Lange 2001-01-12 17:51 ` Manfred Spraul 2001-01-12 17:49 ` Alan Cox 1 sibling, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 17:33 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote: > I would first concentrate on the differences between 2.2 and 2.4: > > Frank, could you try what happens with the NMI oopser disabled? Here's the results with nmi_watchdog=0 Before network hang (nmi_watchdog=0) ==================================== Jan 12 18:24:43 behemoth kernel: SysRq: Jan 12 18:24:43 behemoth kernel: print_PIC() Jan 12 18:24:43 behemoth kernel: Jan 12 18:24:43 behemoth kernel: printing PIC contents Jan 12 18:24:43 behemoth kernel: ... PIC IMR: fffa Jan 12 18:24:43 behemoth kernel: ... PIC IRR: 0000 Jan 12 18:24:43 behemoth kernel: ... PIC ISR: 0000 Jan 12 18:24:43 behemoth kernel: ... PIC ELCR: 1e00 Jan 12 18:24:43 behemoth kernel: print_IO_APIC() Jan 12 18:24:43 behemoth kernel: number of MP IRQ sources: 23. Jan 12 18:24:43 behemoth kernel: number of IO-APIC #2 registers: 24. Jan 12 18:24:43 behemoth kernel: testing the IO APIC....................... Jan 12 18:24:43 behemoth kernel: Jan 12 18:24:43 behemoth kernel: IO APIC #2...... Jan 12 18:24:43 behemoth kernel: .... register #00: 02000000 Jan 12 18:24:43 behemoth kernel: ....... : physical APIC id: 02 Jan 12 18:24:43 behemoth kernel: .... register #01: 00170011 Jan 12 18:24:43 behemoth kernel: ....... : max redirection entries: 0017 Jan 12 18:24:43 behemoth kernel: ....... : IO APIC version: 0011 Jan 12 18:24:43 behemoth kernel: .... register #02: 00000000 Jan 12 18:24:43 behemoth kernel: ....... : arbitration: 00 Jan 12 18:24:43 behemoth kernel: .... IRQ redirection table: Jan 12 18:24:43 behemoth kernel: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: Jan 12 18:24:43 behemoth kernel: 00 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 01 003 03 0 0 0 0 0 1 1 39 Jan 12 18:24:43 behemoth kernel: 02 003 03 0 0 0 0 0 1 1 31 Jan 12 18:24:43 behemoth kernel: 03 003 03 0 0 0 0 0 1 1 41 Jan 12 18:24:43 behemoth kernel: 04 003 03 0 0 0 0 0 1 1 49 Jan 12 18:24:43 behemoth kernel: 05 003 03 0 0 0 0 0 1 1 51 Jan 12 18:24:43 behemoth kernel: 06 003 03 0 0 0 0 0 1 1 59 Jan 12 18:24:43 behemoth kernel: 07 003 03 0 0 0 0 0 1 1 61 Jan 12 18:24:43 behemoth kernel: 08 003 03 0 0 0 0 0 1 1 69 Jan 12 18:24:43 behemoth kernel: 09 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 0a 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 0b 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 0c 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 0d 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 0e 003 03 0 0 0 0 0 1 1 71 Jan 12 18:24:43 behemoth kernel: 0f 003 03 0 0 0 0 0 1 1 79 Jan 12 18:24:43 behemoth kernel: 10 003 03 0 1 0 1 0 1 1 81 Jan 12 18:24:43 behemoth kernel: 11 003 03 0 1 0 1 0 1 1 89 Jan 12 18:24:43 behemoth kernel: 12 003 03 0 1 0 1 0 1 1 91 Jan 12 18:24:43 behemoth kernel: 13 003 03 0 1 0 1 0 1 1 99 Jan 12 18:24:43 behemoth kernel: 14 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 15 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 16 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: 17 000 00 1 0 0 0 0 0 0 00 Jan 12 18:24:43 behemoth kernel: IRQ to pin mappings: Jan 12 18:24:43 behemoth kernel: IRQ0 -> 2 Jan 12 18:24:43 behemoth kernel: IRQ1 -> 1 Jan 12 18:24:43 behemoth kernel: IRQ3 -> 3 Jan 12 18:24:43 behemoth kernel: IRQ4 -> 4 Jan 12 18:24:43 behemoth kernel: IRQ5 -> 5 Jan 12 18:24:43 behemoth kernel: IRQ6 -> 6 Jan 12 18:24:43 behemoth kernel: IRQ7 -> 7 Jan 12 18:24:43 behemoth kernel: IRQ8 -> 8 Jan 12 18:24:43 behemoth kernel: IRQ13 -> 13 Jan 12 18:24:43 behemoth kernel: IRQ14 -> 14 Jan 12 18:24:43 behemoth kernel: IRQ15 -> 15 Jan 12 18:24:43 behemoth kernel: IRQ16 -> 16 Jan 12 18:24:43 behemoth kernel: IRQ17 -> 17 Jan 12 18:24:43 behemoth kernel: IRQ18 -> 18 Jan 12 18:24:43 behemoth kernel: IRQ19 -> 19 Jan 12 18:24:43 behemoth kernel: .................................... done. Jan 12 18:24:43 behemoth kernel: print_all_local_APICs() Jan 12 18:24:43 behemoth kernel: Jan 12 18:24:43 behemoth kernel: printing local APIC contents on CPU#1/1: Jan 12 18:24:43 behemoth kernel: ... APIC ID: 01000000 (1) Jan 12 18:24:43 behemoth kernel: ... APIC VERSION: 00040011 Jan 12 18:24:43 behemoth kernel: ... APIC TASKPRI: 00000000 (00) Jan 12 18:24:43 behemoth kernel: ... APIC ARBPRI: 00000000 (00) Jan 12 18:24:43 behemoth kernel: ... APIC PROCPRI: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC EOI: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC LDR: 02000000 Jan 12 18:24:43 behemoth kernel: ... APIC DFR: ffffffff Jan 12 18:24:43 behemoth kernel: ... APIC SPIV: 000003ff Jan 12 18:24:43 behemoth kernel: ... APIC ISR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 7 times Jan 12 18:24:43 behemoth kernel: ... APIC TMR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 3 times Jan 12 18:24:43 behemoth kernel: 00000000010000000000000001000000 Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 2 times Jan 12 18:24:43 behemoth kernel: ... APIC IRR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 7 times Jan 12 18:24:43 behemoth kernel: ... APIC ESR: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC ICR: 000008fc Jan 12 18:24:43 behemoth kernel: ... APIC ICR2: 01000000 Jan 12 18:24:43 behemoth kernel: ... APIC LVTT: 000200ef Jan 12 18:24:43 behemoth kernel: ... APIC LVTPC: 00010000 Jan 12 18:24:43 behemoth kernel: ... APIC LVT0: 00010700 Jan 12 18:24:43 behemoth kernel: ... APIC LVT1: 00010400 Jan 12 18:24:43 behemoth kernel: ... APIC LVTERR: 000000fe Jan 12 18:24:43 behemoth kernel: ... APIC TMICT: 0000a322 Jan 12 18:24:43 behemoth kernel: ... APIC TMCCT: 00000be6 Jan 12 18:24:43 behemoth kernel: ... APIC TDCR: 00000003 Jan 12 18:24:43 behemoth kernel: Jan 12 18:24:43 behemoth kernel: Jan 12 18:24:43 behemoth kernel: printing local APIC contents on CPU#0/0: Jan 12 18:24:43 behemoth kernel: ... APIC ID: 00000000 (0) Jan 12 18:24:43 behemoth kernel: ... APIC VERSION: 00040011 Jan 12 18:24:43 behemoth kernel: ... APIC TASKPRI: 00000000 (00) Jan 12 18:24:43 behemoth kernel: ... APIC ARBPRI: 000000e0 (e0) Jan 12 18:24:43 behemoth kernel: ... APIC PROCPRI: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC EOI: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC LDR: 01000000 Jan 12 18:24:43 behemoth kernel: ... APIC DFR: ffffffff Jan 12 18:24:43 behemoth kernel: ... APIC SPIV: 000003ff Jan 12 18:24:43 behemoth kernel: ... APIC ISR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 7 times Jan 12 18:24:43 behemoth kernel: ... APIC TMR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 3 times Jan 12 18:24:43 behemoth kernel: 00000000010000000000000001000000 Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 2 times Jan 12 18:24:43 behemoth kernel: ... APIC IRR field: Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth kernel: 00000000000000000100000000000000 Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:24:43 behemoth last message repeated 4 times Jan 12 18:24:43 behemoth kernel: 00000000000000010000000000000000 Jan 12 18:24:43 behemoth kernel: ... APIC ESR: 00000000 Jan 12 18:24:43 behemoth kernel: ... APIC ICR: 000c08fb Jan 12 18:24:43 behemoth kernel: ... APIC ICR2: 02000000 Jan 12 18:24:43 behemoth kernel: ... APIC LVTT: 000200ef Jan 12 18:24:43 behemoth kernel: ... APIC LVTPC: 00010000 Jan 12 18:24:43 behemoth kernel: ... APIC LVT0: 00010700 Jan 12 18:24:43 behemoth kernel: ... APIC LVT1: 00000400 Jan 12 18:24:43 behemoth kernel: ... APIC LVTERR: 000000fe Jan 12 18:24:43 behemoth kernel: ... APIC TMICT: 0000a322 Jan 12 18:24:43 behemoth kernel: ... APIC TMCCT: 000041e1 Jan 12 18:24:43 behemoth kernel: ... APIC TDCR: 00000003 After network hang (nmi_watchdog=0) =================================== Jan 12 18:26:21 behemoth kernel: SysRq: Jan 12 18:26:21 behemoth kernel: print_PIC() Jan 12 18:26:21 behemoth kernel: Jan 12 18:26:21 behemoth kernel: printing PIC contents Jan 12 18:26:21 behemoth kernel: ... PIC IMR: fffa Jan 12 18:26:21 behemoth kernel: ... PIC IRR: 0000 Jan 12 18:26:21 behemoth kernel: ... PIC ISR: 0000 Jan 12 18:26:21 behemoth kernel: ... PIC ELCR: 1e00 Jan 12 18:26:21 behemoth kernel: print_IO_APIC() Jan 12 18:26:21 behemoth kernel: number of MP IRQ sources: 23. Jan 12 18:26:21 behemoth kernel: number of IO-APIC #2 registers: 24. Jan 12 18:26:21 behemoth kernel: testing the IO APIC....................... Jan 12 18:26:21 behemoth kernel: Jan 12 18:26:21 behemoth kernel: IO APIC #2...... Jan 12 18:26:21 behemoth kernel: .... register #00: 02000000 Jan 12 18:26:21 behemoth kernel: ....... : physical APIC id: 02 Jan 12 18:26:21 behemoth kernel: .... register #01: 00170011 Jan 12 18:26:21 behemoth kernel: ....... : max redirection entries: 0017 Jan 12 18:26:21 behemoth kernel: ....... : IO APIC version: 0011 Jan 12 18:26:21 behemoth kernel: .... register #02: 00000000 Jan 12 18:26:21 behemoth kernel: ....... : arbitration: 00 Jan 12 18:26:21 behemoth kernel: .... IRQ redirection table: Jan 12 18:26:21 behemoth kernel: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: Jan 12 18:26:21 behemoth kernel: 00 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 01 003 03 0 0 0 0 0 1 1 39 Jan 12 18:26:21 behemoth kernel: 02 003 03 0 0 0 0 0 1 1 31 Jan 12 18:26:21 behemoth kernel: 03 003 03 0 0 0 0 0 1 1 41 Jan 12 18:26:21 behemoth kernel: 04 003 03 0 0 0 0 0 1 1 49 Jan 12 18:26:21 behemoth kernel: 05 003 03 0 0 0 0 0 1 1 51 Jan 12 18:26:21 behemoth kernel: 06 003 03 0 0 0 0 0 1 1 59 Jan 12 18:26:21 behemoth kernel: 07 003 03 0 0 0 0 0 1 1 61 Jan 12 18:26:21 behemoth kernel: 08 003 03 0 0 0 0 0 1 1 69 Jan 12 18:26:21 behemoth kernel: 09 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 0a 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 0b 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 0c 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 0d 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 0e 003 03 0 0 0 0 0 1 1 71 Jan 12 18:26:21 behemoth kernel: 0f 003 03 0 0 0 0 0 1 1 79 Jan 12 18:26:21 behemoth kernel: 10 003 03 0 1 0 1 0 1 1 81 Jan 12 18:26:21 behemoth kernel: 11 003 03 0 1 0 1 0 1 1 89 Jan 12 18:26:21 behemoth kernel: 12 003 03 0 1 0 1 0 1 1 91 Jan 12 18:26:21 behemoth kernel: 13 003 03 0 1 1 1 0 1 1 99 Jan 12 18:26:21 behemoth kernel: 14 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 15 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 16 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: 17 000 00 1 0 0 0 0 0 0 00 Jan 12 18:26:21 behemoth kernel: IRQ to pin mappings: Jan 12 18:26:21 behemoth kernel: IRQ0 -> 2 Jan 12 18:26:21 behemoth kernel: IRQ1 -> 1 Jan 12 18:26:21 behemoth kernel: IRQ3 -> 3 Jan 12 18:26:21 behemoth kernel: IRQ4 -> 4 Jan 12 18:26:21 behemoth kernel: IRQ5 -> 5 Jan 12 18:26:21 behemoth kernel: IRQ6 -> 6 Jan 12 18:26:21 behemoth kernel: IRQ7 -> 7 Jan 12 18:26:21 behemoth kernel: IRQ8 -> 8 Jan 12 18:26:21 behemoth kernel: IRQ13 -> 13 Jan 12 18:26:21 behemoth kernel: IRQ14 -> 14 Jan 12 18:26:21 behemoth kernel: IRQ15 -> 15 Jan 12 18:26:21 behemoth kernel: IRQ16 -> 16 Jan 12 18:26:21 behemoth kernel: IRQ17 -> 17 Jan 12 18:26:21 behemoth kernel: IRQ18 -> 18 Jan 12 18:26:21 behemoth kernel: IRQ19 -> 19 Jan 12 18:26:21 behemoth kernel: .................................... done. Jan 12 18:26:21 behemoth kernel: print_all_local_APICs() Jan 12 18:26:21 behemoth kernel: Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#1/1: Jan 12 18:26:21 behemoth kernel: ... APIC ID: 01000000 (1) Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011 Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI: 00000000 (00) Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 00000000 (00) Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 02000000 Jan 12 18:26:21 behemoth kernel: ... APIC DFR: ffffffff Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 000003ff Jan 12 18:26:21 behemoth kernel: ... APIC ISR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 7 times Jan 12 18:26:21 behemoth kernel: ... APIC TMR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 3 times Jan 12 18:26:21 behemoth kernel: 00000000010000000000000000000000 Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 2 times Jan 12 18:26:21 behemoth kernel: ... APIC IRR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 7 times Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000008fc Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 01000000 Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 00010000 Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700 Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00010400 Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 000000fe Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: 0000a322 Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 00001803 Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 00000003 Jan 12 18:26:21 behemoth kernel: Jan 12 18:26:21 behemoth kernel: Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#0/0: Jan 12 18:26:21 behemoth kernel: ... APIC ID: 00000000 (0) Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011 Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI: 00000000 (00) Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 000000e0 (e0) Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 01000000 Jan 12 18:26:21 behemoth kernel: ... APIC DFR: ffffffff Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 000003ff Jan 12 18:26:21 behemoth kernel: ... APIC ISR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 7 times Jan 12 18:26:21 behemoth kernel: ... APIC TMR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 3 times Jan 12 18:26:21 behemoth kernel: 00000000010000000000000001000000 Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 2 times Jan 12 18:26:21 behemoth kernel: ... APIC IRR field: Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000 Jan 12 18:26:21 behemoth last message repeated 6 times Jan 12 18:26:21 behemoth kernel: 00000000000000010000000000000000 Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 00000000 Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000c08fb Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 02000000 Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 00010000 Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700 Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00000400 Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 000000fe Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: 0000a322 Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 00004e26 Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 00000003 -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 17:33 ` Frank de Lange @ 2001-01-12 17:51 ` Manfred Spraul 2001-01-12 18:25 ` Frank de Lange 0 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 17:51 UTC (permalink / raw) To: Frank de Lange; +Cc: dwmw2, linux-kernel [-- Attachment #1: Type: text/plain, Size: 499 bytes --] Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote: > > I would first concentrate on the differences between 2.2 and 2.4: > > > > Frank, could you try what happens with the NMI oopser disabled? > > Here's the results with nmi_watchdog=0 > > > After network hang (nmi_watchdog=0) > =================================== > It still hangs. Frank, I've attached a proposed kick_IOAPIC pin. Could you try it? I'm rebooting with that patch right now. -- Manfred [-- Attachment #2: patch-frank --] [-- Type: text/plain, Size: 1629 bytes --] 1) add to the end of io_apic.c: static void print_line(struct IO_APIC_route_entry* entry) { printk(KERN_EMERG " %02x %03X %02X ", 0, entry->dest.logical.logical_dest, entry->dest.physical.physical_dest ); printk("%1d %1d %1d %1d %1d %1d %1d %02X\n", entry->mask, entry->trigger, entry->irr, entry->polarity, entry->delivery_status, entry->dest_mode, entry->delivery_mode, entry->vector ); } void kick_IOAPIC_pin(int pin) { unsigned long flags; struct IO_APIC_route_entry entry; local_irq_save(flags); *(((int *)&entry) + 1) = io_apic_read(0, 0x11 + 2 * pin); *(((int *)&entry) + 0) = io_apic_read(0, 0x10 + 2 * pin); printk(KERN_EMERG " NR Log Phy Mask Trig IRR Pol" " Stat Dest Deli Vect: \n"); printk(KERN_EMERG "Before:\n"); print_line(&entry); entry.trigger = 0; io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry) + 1)); io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry) + 0)); udelay(10); printk(KERN_EMERG "After switching to edge:\n"); print_line(&entry); entry.trigger = 1; io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry) + 1)); io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry) + 0)); udelay(10); printk(KERN_EMERG "After switch back:\n"); print_line(&entry); local_irq_restore(flags); } 2) add to sysrq.c: --- 2.4/drivers/char/sysrq.c Mon Dec 4 02:48:19 2000 +++ build-2.4/drivers/char/sysrq.c Fri Jan 12 18:37:57 2001 @@ -137,6 +137,9 @@ send_sig_all(SIGKILL, 1); orig_log_level = 8; break; + case 'q': + kick_IOAPIC_pin(19); + default: /* Unknown: help */ if (kbd) printk("unRaw "); ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 17:51 ` Manfred Spraul @ 2001-01-12 18:25 ` Frank de Lange 2001-01-12 19:04 ` Manfred Spraul 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 18:25 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel On Fri, Jan 12, 2001 at 06:51:36PM +0100, Manfred Spraul wrote: > Frank, I've attached a proposed kick_IOAPIC pin. Could you try it? > I'm rebooting with that patch right now. I added the patch, and tried it out. When the network hangs, I am able to revive it with ALT-SYSRQ-Q. The debug log shows these entries: Jan 12 19:22:57 behemoth kernel: SysRq: <0> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: Jan 12 19:22:57 behemoth kernel: Before: Jan 12 19:22:57 behemoth kernel: 00 003 03 0 1 1 1 1 1 1 99 Jan 12 19:22:57 behemoth kernel: After switching to edge: Jan 12 19:22:57 behemoth kernel: 00 003 03 0 0 1 1 1 1 1 99 Jan 12 19:22:57 behemoth kernel: After switch back: Jan 12 19:22:57 behemoth kernel: 00 003 03 0 1 1 1 1 1 1 99 -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 18:25 ` Frank de Lange @ 2001-01-12 19:04 ` Manfred Spraul 2001-01-12 19:07 ` Frank de Lange 2001-01-12 19:21 ` Frank de Lange 0 siblings, 2 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 19:04 UTC (permalink / raw) To: Frank de Lange; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds [-- Attachment #1: Type: text/plain, Size: 371 bytes --] Linus wrote: > Does this seem to happen mainly with drivers that use "disable_irq()" > and "enable_irq()"? I know the ne drivers do (through the 8390 module), > and some others do too (3c59x). I removed the disable_irq lines from 8390.c, and that fixed the problem: no hang within 2 minutes - the test is still running. Frank, could you double check it? -- Manfred [-- Attachment #2: patch-frank --] [-- Type: text/plain, Size: 1620 bytes --] // $Header$ // Kernel Version: // VERSION = 2 // PATCHLEVEL = 4 // SUBLEVEL = 0 // EXTRAVERSION = --- 2.4/drivers/net/8390.c Thu Jan 4 22:00:55 2001 +++ build-2.4/drivers/net/8390.c Fri Jan 12 19:53:47 2001 @@ -242,15 +242,15 @@ /* Ugly but a reset can be slow, yet must be protected */ - disable_irq_nosync(dev->irq); - spin_lock(&ei_local->page_lock); +/* disable_irq_nosync(dev->irq);*/ + spin_lock_irqsave(&ei_local->page_lock, flags); /* Try to restart the card. Perhaps the user has fixed something. */ ei_reset_8390(dev); NS8390_init(dev, 1); - spin_unlock(&ei_local->page_lock); - enable_irq(dev->irq); + spin_unlock_irqrestore(&ei_local->page_lock, flags); +/* enable_irq(dev->irq); */ netif_wake_queue(dev); } @@ -285,9 +285,9 @@ * Slow phase with lock held. */ - disable_irq_nosync(dev->irq); +/* disable_irq_nosync(dev->irq);*/ - spin_lock(&ei_local->page_lock); + spin_lock_irqsave(&ei_local->page_lock, flags); ei_local->irqlock = 1; @@ -327,8 +327,8 @@ ei_local->irqlock = 0; netif_stop_queue(dev); outb_p(ENISR_ALL, e8390_base + EN0_IMR); - spin_unlock(&ei_local->page_lock); - enable_irq(dev->irq); + spin_unlock_irqrestore(&ei_local->page_lock, flags); +/* enable_irq(dev->irq);*/ ei_local->stat.tx_errors++; return 1; } @@ -383,8 +383,8 @@ ei_local->irqlock = 0; outb_p(ENISR_ALL, e8390_base + EN0_IMR); - spin_unlock(&ei_local->page_lock); - enable_irq(dev->irq); + spin_unlock_irqrestore(&ei_local->page_lock, flags); +/* enable_irq(dev->irq); */ dev_kfree_skb (skb); ei_local->stat.tx_bytes += send_length; ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:04 ` Manfred Spraul @ 2001-01-12 19:07 ` Frank de Lange 2001-01-12 19:21 ` Frank de Lange 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 19:07 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote: > Linus wrote: > > Does this seem to happen mainly with drivers that use "disable_irq()" > > and "enable_irq()"? I know the ne drivers do (through the 8390 module), > > and some others do too (3c59x). > > I removed the disable_irq lines from 8390.c, and that fixed the problem: > no hang within 2 minutes - the test is still running. > > Frank, could you double check it? Hm, I also sent in a (somewhat different) patch on my own... :-)] Anyway, still running under heavy load... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:04 ` Manfred Spraul 2001-01-12 19:07 ` Frank de Lange @ 2001-01-12 19:21 ` Frank de Lange 2001-01-12 19:33 ` Manfred Spraul 1 sibling, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 19:21 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote: > I removed the disable_irq lines from 8390.c, and that fixed the problem: > no hang within 2 minutes - the test is still running. > > Frank, could you double check it? I'm currently running my own patched version, which uses spin_lock_irq/spin_unlock_irq instead of spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to the original intent (disable_irq) than spin_lock_irqsave. Anyone want to comment on this? Anyway, still running under load, also got USB (which uses the same irq) to produce some interrupts by scanning some stuff. No problems so far... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:21 ` Frank de Lange @ 2001-01-12 19:33 ` Manfred Spraul 2001-01-12 19:52 ` Frank de Lange 2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox 0 siblings, 2 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 19:33 UTC (permalink / raw) To: Frank de Lange; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote: > > I removed the disable_irq lines from 8390.c, and that fixed the problem: > > no hang within 2 minutes - the test is still running. > > > > Frank, could you double check it? > > I'm currently running my own patched version, which uses > spin_lock_irq/spin_unlock_irq instead of > spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at > spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to > the original intent (disable_irq) than spin_lock_irqsave. Anyone want to > comment on this? > It's a bit dangerous: _if_ one of the function is called with disabled local interrupts, then spin_unlock_irq would enable these interrupts. That could cause other problems, but I haven't checked if these function are actually called with disabled interrupts - e.g. the transmit function is called with enabled interrupts. Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp: * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c * From myself: TARGET_CPU = cpu_online_mask, was 0xFF. Could you disable both bandaids? I disabled them, no problems so far. Now back to the disable_irq_nosync(). -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:33 ` Manfred Spraul @ 2001-01-12 19:52 ` Frank de Lange 2001-01-12 19:59 ` Linus Torvalds 2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox 1 sibling, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 19:52 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote: > Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp: > > * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c > * From myself: TARGET_CPU = cpu_online_mask, was 0xFF. > > Could you disable both bandaids? I disabled them, no problems so far. I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in io_apic.c?), and reverted my own patch, added your patch... Now running with the usual heavy network load, no problems so far... Also made USB produce interrupts (shares irq with network), no problems... Could this really be the solution? Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:52 ` Frank de Lange @ 2001-01-12 19:59 ` Linus Torvalds 2001-01-12 20:03 ` Ingo Molnar ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Linus Torvalds @ 2001-01-12 19:59 UTC (permalink / raw) To: Frank de Lange; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote: > > Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp: > > > > * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c > > * From myself: TARGET_CPU = cpu_online_mask, was 0xFF. > > > > Could you disable both bandaids? I disabled them, no problems so far. > > I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in > io_apic.c?), and reverted my own patch, added your patch... Now running with > the usual heavy network load, no problems so far... Also made USB produce > interrupts (shares irq with network), no problems... > > Could this really be the solution? I'd like to know _which_ of the two makes a difference (or does it only trigger with both of them enabled)? And even then I'm not sure that it is "the" solution - both changes to io-apic handling had some reason for them. Ingo, what was the focus-cpu thing? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:59 ` Linus Torvalds @ 2001-01-12 20:03 ` Ingo Molnar 2001-01-14 0:13 ` Roeland Th. Jansen 2001-01-12 20:05 ` Frank de Lange ` (2 subsequent siblings) 3 siblings, 1 reply; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 20:03 UTC (permalink / raw) To: Linus Torvalds Cc: Frank de Lange, Manfred Spraul, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Linus Torvalds wrote: > [...] Ingo, what was the focus-cpu thing? well, some time ago i had an ne2k card in an SMP system as well, and found this very problem. Disabling/enabling focus-cpu appeared to make a difference, but later on i made experiments that show that in both cases the hang happens. I spent a good deal of time trying to fix this problem, but failed - so any fresh ideas are more than welcome. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 20:03 ` Ingo Molnar @ 2001-01-14 0:13 ` Roeland Th. Jansen 2001-01-14 0:23 ` Frank de Lange 0 siblings, 1 reply; 90+ messages in thread From: Roeland Th. Jansen @ 2001-01-14 0:13 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Frank de Lange, Manfred Spraul, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote: > well, some time ago i had an ne2k card in an SMP system as well, and found > this very problem. Disabling/enabling focus-cpu appeared to make a > difference, but later on i made experiments that show that in both cases > the hang happens. I spent a good deal of time trying to fix this problem, > but failed - so any fresh ideas are more than welcome. for the record. my BP6, non OC, apic smp system with ne2k fails within 24 hours here too. if I can be of any help..... (2.4.0. kernel. no vmware or opensound) -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB | on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-14 0:13 ` Roeland Th. Jansen @ 2001-01-14 0:23 ` Frank de Lange 0 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-14 0:23 UTC (permalink / raw) To: Roeland Th. Jansen Cc: Ingo Molnar, Linus Torvalds, Manfred Spraul, dwmw2, linux-kernel, Alan Cox On Sun, Jan 14, 2001 at 12:13:58AM +0000, Roeland Th. Jansen wrote: > On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote: > > well, some time ago i had an ne2k card in an SMP system as well, and found > > this very problem. Disabling/enabling focus-cpu appeared to make a > > difference, but later on i made experiments that show that in both cases > > the hang happens. I spent a good deal of time trying to fix this problem, > > but failed - so any fresh ideas are more than welcome. > > for the record. my BP6, non OC, apic smp system with ne2k fails within > 24 hours here too. if I can be of any help..... (2.4.0. kernel. no > vmware or opensound) You can help yourself by applying Manfred's patch to 8390.c (in preference to my own patch to the same file). This will sove the hanging-network problem. If your entire box hangs, that's another story which will probably not be fixed by that patch. You can find the patch in Manfred's posting to the list from Fri Jan 12 2001 - 14:04:24 EST. I've been running a patched driver for more than a day now, under heavy network load, without problems. Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:59 ` Linus Torvalds 2001-01-12 20:03 ` Ingo Molnar @ 2001-01-12 20:05 ` Frank de Lange 2001-01-12 20:11 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? Manfred Spraul 2001-01-12 21:21 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange 3 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 20:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox On Fri, Jan 12, 2001 at 11:59:25AM -0800, Linus Torvalds wrote: > > Could this really be the solution? > > I'd like to know _which_ of the two makes a difference (or does it only > trigger with both of them enabled)? And even then I'm not sure that it is > "the" solution - both changes to io-apic handling had some reason for > them. Ingo, what was the focus-cpu thing? Well, with 'this' (in 'could THIS be') I really meant the move from disable_irq to the irq_safe spinlocks. I'm currently running with the patched 8390.c driver, patched io_apic (TARGET_CPUS 0xff) and patched apic.c (focus cpu enabled), and have had no problems yet... even though I'm running several simulatnsous nfs cp -rd <big_dir>, streaming network audio, scanning with an USB scanner, etc. So far, it seems that the patch to 8390.c removed the symptoms. The changes to apic.c and io_apic.c did not make the network hang come back. Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 19:59 ` Linus Torvalds 2001-01-12 20:03 ` Ingo Molnar 2001-01-12 20:05 ` Frank de Lange @ 2001-01-12 20:11 ` Manfred Spraul 2001-01-12 20:16 ` Frank de Lange 2001-01-12 21:21 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange 3 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 20:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Frank de Lange, dwmw2, linux-kernel, mingo, Alan Cox Linus Torvalds wrote: > > > I'd like to know _which_ of the two makes a difference (or does it only > trigger with both of them enabled)? And even then I'm not sure that it is > "the" solution - both changes to io-apic handling had some reason for > them. Ingo, what was the focus-cpu thing? > Frank, please clarify: you still run without disable_irq_nosync() in 8390.c? I have a first idea: we send an EOI to an interrupt that is masked on the IO apic, perhaps that causes the problems. I'm right now typing a patch. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:11 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? Manfred Spraul @ 2001-01-12 20:16 ` Frank de Lange 2001-01-12 20:19 ` Ingo Molnar 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 20:16 UTC (permalink / raw) To: Manfred Spraul; +Cc: Linus Torvalds, dwmw2, linux-kernel, mingo, Alan Cox On Fri, Jan 12, 2001 at 09:11:29PM +0100, Manfred Spraul wrote: > Frank, please clarify: > you still run without disable_irq_nosync() in 8390.c? I am running with your patched version of 8390.c (so WITHOUT disable_irq_nosync()). In addition, I patched apic.c (focus cpu enabled) In addition, I patched io_apic ((TARGET_CPUS 0xff) > I have a first idea: we send an EOI to an interrupt that is masked on > the IO apic, perhaps that causes the problems. Sound plausible... > I'm right now typing a patch. I'll await yours instead of making my own patch this time... :-) Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:16 ` Frank de Lange @ 2001-01-12 20:19 ` Ingo Molnar 2001-01-12 20:26 ` Frank de Lange 0 siblings, 1 reply; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 20:19 UTC (permalink / raw) To: Frank de Lange Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > In addition, I patched apic.c (focus cpu enabled) > In addition, I patched io_apic ((TARGET_CPUS 0xff) please try it with the focus CPU enabling change (we want to enable that feature, i only disabled it due to the stuck-ne2k bug), but with TARGET_CPUS set to cpu_online_mask. (this later is needed for certain crappy BIOSes.) i believe the ne2k driver change is the key. > > I have a first idea: we send an EOI to an interrupt that is masked on > > the IO apic, perhaps that causes the problems. > > Sound plausible... does not help. I've tried it (and many other combinations). I did not find any direct workaround for this problem. (i tried very hard.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:19 ` Ingo Molnar @ 2001-01-12 20:26 ` Frank de Lange 2001-01-12 20:31 ` Ingo Molnar 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 20:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:19:53PM +0100, Ingo Molnar wrote: > > In addition, I patched apic.c (focus cpu enabled) > > In addition, I patched io_apic ((TARGET_CPUS 0xff) > > please try it with the focus CPU enabling change (we want to enable that > feature, i only disabled it due to the stuck-ne2k bug), but with > TARGET_CPUS set to cpu_online_mask. (this later is needed for certain > crappy BIOSes.) WITH or WITHOUT the changed 8390 driver? I can already give you the results for running WITH the changed driver: it works. I have not yet tried it WITHOUT the changed 8390 driver (so that would be stock 8390, patched apic.c, stock io_apic.c). Please let me know which you want... Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:26 ` Frank de Lange @ 2001-01-12 20:31 ` Ingo Molnar 2001-01-12 20:35 ` Frank de Lange 2001-01-12 23:50 ` Alan Cox 0 siblings, 2 replies; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 20:31 UTC (permalink / raw) To: Frank de Lange Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > WITH or WITHOUT the changed 8390 driver? I can already give you the > results for running WITH the changed driver: it works. I have not yet > tried it WITHOUT the changed 8390 driver (so that would be stock 8390, > patched apic.c, stock io_apic.c). Please let me know which you want... WITH. patched 8390.c, patched apic.c, sock io_apic.c. My very strong feeling is that this will be a stable combination, and that this is what we want as a final solution. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:31 ` Ingo Molnar @ 2001-01-12 20:35 ` Frank de Lange 2001-01-12 20:37 ` Ingo Molnar 2001-01-12 23:50 ` Alan Cox 1 sibling, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 20:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:31:15PM +0100, Ingo Molnar wrote: > > On Fri, 12 Jan 2001, Frank de Lange wrote: > > > WITH or WITHOUT the changed 8390 driver? I can already give you the > > results for running WITH the changed driver: it works. I have not yet > > tried it WITHOUT the changed 8390 driver (so that would be stock 8390, > > patched apic.c, stock io_apic.c). Please let me know which you want... > > WITH. patched 8390.c, patched apic.c, sock io_apic.c. My very strong > feeling is that this will be a stable combination, and that this is what > we want as a final solution. It is. As I already mentioned in other messages, I already tested with JUST the patched 8390.c driver, no other patches. It was stable. I then patched apic.c AND io_apic.c, which did not introduce new instabilities. Unless you think that reverting back to a stock io_apic.c would cause instabilities (which would be weird, since I had no instabilities running only a patched 8390.c), I think the patch to 8390.c DOES remove the symptoms all by itself. No other patches seem necessary to get a stable box. But I'll patch the mess again just fox kicks :-) Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:35 ` Frank de Lange @ 2001-01-12 20:37 ` Ingo Molnar 2001-01-12 20:46 ` David Woodhouse ` (2 more replies) 0 siblings, 3 replies; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 20:37 UTC (permalink / raw) To: Frank de Lange Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > It is. As I already mentioned in other messages, I already tested with > JUST the patched 8390.c driver, no other patches. It was stable. I > then patched apic.c AND io_apic.c, which did not introduce new > instabilities. Unless you think that reverting back to a stock > io_apic.c would cause instabilities (which would be weird, since I had > no instabilities running only a patched 8390.c), I think the patch to > 8390.c DOES remove the symptoms all by itself. No other patches seem > necessary to get a stable box. okay - i just wanted to hear a definitive word from you that this fixes your problem, because this is what we'll have to do as a final solution. (barring any other solution.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:37 ` Ingo Molnar @ 2001-01-12 20:46 ` David Woodhouse 2001-01-12 20:46 ` Frank de Lange 2001-01-12 20:54 ` Manfred Spraul 2 siblings, 0 replies; 90+ messages in thread From: David Woodhouse @ 2001-01-12 20:46 UTC (permalink / raw) To: Ingo Molnar Cc: Frank de Lange, Manfred Spraul, Linus Torvalds, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Ingo Molnar wrote: > okay - i just wanted to hear a definitive word from you that this fixes > your problem, because this is what we'll have to do as a final solution. > (barring any other solution.) Patching 8390.c won't fix this for me. The only thing on IRQ19 when I saw interrupts die was usb-uhci, and that doesn't appear to use disable_irq. But then again, I've only ever seen this happen once. It's not repeatable. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:37 ` Ingo Molnar 2001-01-12 20:46 ` David Woodhouse @ 2001-01-12 20:46 ` Frank de Lange 2001-01-12 20:51 ` Ingo Molnar 2001-01-13 0:15 ` Linus Torvalds 2001-01-12 20:54 ` Manfred Spraul 2 siblings, 2 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 20:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:37:24PM +0100, Ingo Molnar wrote: > okay - i just wanted to hear a definitive word from you that this fixes > your problem, because this is what we'll have to do as a final solution. > (barring any other solution.) Now running with this config: PATCHED 8390.c (using irq_safe spinlocks instead of disable_irq) PATCHED apic.c (focus cpu ENABLED) STOCK io_apic.c No problems under heavy network load. Gentleman, this (the patch to 8390.c) seems to fix the problem. Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:46 ` Frank de Lange @ 2001-01-12 20:51 ` Ingo Molnar 2001-01-12 21:05 ` Frank de Lange 2001-01-13 0:15 ` Linus Torvalds 1 sibling, 1 reply; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 20:51 UTC (permalink / raw) To: Frank de Lange Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > PATCHED 8390.c (using irq_safe spinlocks instead of disable_irq) > PATCHED apic.c (focus cpu ENABLED) > STOCK io_apic.c > > No problems under heavy network load. > > Gentleman, this (the patch to 8390.c) seems to fix the problem. great. Back when i had the same problem, flood pinging another host (on the local network) was the quickest way to reproduce the hang: ping -f -s 10 otherhost this produced an IOAPIC-hang within seconds. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:51 ` Ingo Molnar @ 2001-01-12 21:05 ` Frank de Lange 2001-01-15 2:00 ` Jorge Nerin 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 21:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:51:36PM +0100, Ingo Molnar wrote: > great. Back when i had the same problem, flood pinging another host (on > the local network) was the quickest way to reproduce the hang: > > ping -f -s 10 otherhost > > this produced an IOAPIC-hang within seconds. Apart from killing streaming audio and interactive network use, nothing hangs. As soon as the ping flood is stopped, audio streams on and ssh sessions are useable again. So, it seems to fix it... Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 21:05 ` Frank de Lange @ 2001-01-15 2:00 ` Jorge Nerin 0 siblings, 0 replies; 90+ messages in thread From: Jorge Nerin @ 2001-01-15 2:00 UTC (permalink / raw) To: Frank de Lange Cc: Ingo Molnar, Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 09:51:36PM +0100, Ingo Molnar wrote: > > great. Back when i had the same problem, flood pinging another host (on > > the local network) was the quickest way to reproduce the hang: > > > > ping -f -s 10 otherhost > > > > this produced an IOAPIC-hang within seconds. > > Apart from killing streaming audio and interactive network use, nothing hangs. > As soon as the ping flood is stopped, audio streams on and ssh sessions are > useable again. So, it seems to fix it... > > Frank I do have a 3c503 and a ne2k-pci both of them use the 8390, I can hang the ne2k-pci easily by doing a ping -f, bigger packet size => early the hang. But I cannot hang the 3c503 by doing this. Now with 2.4.0 the ne2k-pci behaviour is that: doing a ping -f works for some amount of time, then stops for a BIG amount of time (various minutes), and then it works again (it seems), but at a much slower speed, and if you test it with normal ping (ping host) you don't get replies. The packets really go down to the wire and I even got replies. but I don't receive it. Previous versions of 2.4.0-testX caused ne2k-pci to hang and remain hanged until reboot. System: Mb Gigabyte 586dx, 2x200MMX, 96Mb RAM, -- Jorge Nerin <comandante@zaralinux.com> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:46 ` Frank de Lange 2001-01-12 20:51 ` Ingo Molnar @ 2001-01-13 0:15 ` Linus Torvalds 2001-01-13 0:19 ` Frank de Lange 1 sibling, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 0:15 UTC (permalink / raw) To: Frank de Lange; +Cc: Ingo Molnar, Manfred Spraul, dwmw2, linux-kernel, Alan Cox On Fri, 12 Jan 2001, Frank de Lange wrote: > > Gentleman, this (the patch to 8390.c) seems to fix the problem. The problem with this patch is that anybody with a slow ISA ne2000 clone will basically have absolutely _horrible_ interrupt latency because we hold the irq lock over some quite expensive operations. The spin_lock_irqsave() is absolutely my preferred fix, and if I remember correctly this is in fact how some early 2.1.x code fixed the ne2000 driver when the original irq scalability stuff happened (for some time during development we did not have a working "disable_irq()" AT ALL because the irq-disabling counters etc logic hadn't been done). The spinlock was changed to "disable_irq()" by a patch from Alan, if I remember correctly, exactly because people couldn't access serial lines at any kind of high speeds otherwise - even on "reasonable" hardware. Alan may remember details better. The fact is that as a general design principle we should _not_ be using "disable_irq/enable_irq" anyway. BUT that there are some real-world concerns that make the "better" spinlock handling have some problems too. So yes, I bet the spinlock fixes it. But.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-13 0:15 ` Linus Torvalds @ 2001-01-13 0:19 ` Frank de Lange 2001-01-13 0:29 ` Alan Cox 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-13 0:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ingo Molnar, Manfred Spraul, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 04:15:37PM -0800, Linus Torvalds wrote: > On Fri, 12 Jan 2001, Frank de Lange wrote: > > > > Gentleman, this (the patch to 8390.c) seems to fix the problem. > > The problem with this patch is that anybody with a slow ISA ne2000 clone > will basically have absolutely _horrible_ interrupt latency because we > hold the irq lock over some quite expensive operations. > > The spin_lock_irqsave() is absolutely my preferred fix, and if I remember > correctly this is in fact how some early 2.1.x code fixed the ne2000 > driver when the original irq scalability stuff happened (for some time > during development we did not have a working "disable_irq()" AT ALL > because the irq-disabling counters etc logic hadn't been done). And that's the patch I meant... Manfred's spin_lock_irqsave/spin_unlock_irqrestore based one, not my (spin_lock_irq/spin_unlock_irq) based patch. That is also the one I'm running now. Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-13 0:19 ` Frank de Lange @ 2001-01-13 0:29 ` Alan Cox 0 siblings, 0 replies; 90+ messages in thread From: Alan Cox @ 2001-01-13 0:29 UTC (permalink / raw) To: Frank de Lange Cc: Linus Torvalds, Ingo Molnar, Manfred Spraul, dwmw2, linux-kernel, Alan Cox > > The spin_lock_irqsave() is absolutely my preferred fix, and if I remember > > correctly this is in fact how some early 2.1.x code fixed the ne2000 > > driver when the original irq scalability stuff happened (for some time > > during development we did not have a working "disable_irq()" AT ALL > > because the irq-disabling counters etc logic hadn't been done). > > And that's the patch I meant... Manfred's > spin_lock_irqsave/spin_unlock_irqrestore based one, not my > (spin_lock_irq/spin_unlock_irq) based patch. That is also the one I'm running > now. The old code did it with #ifdef __SMP__ tests so it only screwed up SMP boxes, which at the time was quite acceptable because real people didnt have them and certainly at the price didnt put ne2000's in them 8) The basic problem is that you cannot allow - set multicast list - open/close - irq - transmit to occur except when serialized because of the indirection and register gunge on the chip. The copies are slow and long enough that they prevent 28.8 modem sessions being usable. I'd have to check the chip manual to be sure you even disable the irqs without corrupting an in progress transfer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:37 ` Ingo Molnar 2001-01-12 20:46 ` David Woodhouse 2001-01-12 20:46 ` Frank de Lange @ 2001-01-12 20:54 ` Manfred Spraul 2001-01-12 21:07 ` Frank de Lange 2 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 20:54 UTC (permalink / raw) To: mingo; +Cc: Frank de Lange, Linus Torvalds, dwmw2, linux-kernel, Alan Cox [-- Attachment #1: Type: text/plain, Size: 887 bytes --] Ingo Molnar wrote: > > > okay - i just wanted to hear a definitive word from you that this fixes > your problem, because this is what we'll have to do as a final solution. > (barring any other solution.) > Ingo, is that possible? The current fix is "disable_irq_nosync() and enable_irq() cause deadlocks with level triggered ioapic irqs, do not use them" - I'm sure ne2k-pci isn't the only driver that uses these function. I have found one combination that doesn't hang with the unpatched 8390.c, but network throughput is down to 1/2. I hope that's due to the debugging changes. I'll restart now from a fresh 2.4.0 tree: Changes: 1) enable focus cpu. 2) apply the attached patch. I'm not sure if it's a real fix or if it just hides the problem: my sysrq patch has shown that clearing and setting the "level trigger" bit in the io apic reanimates the IO APIC. -- Manfred [-- Attachment #2: patch-io --] [-- Type: text/plain, Size: 1698 bytes --] --- build-2.4/arch/i386/kernel/io_apic.c.orig Fri Jan 12 20:17:36 2001 +++ build-2.4/arch/i386/kernel/io_apic.c Fri Jan 12 21:26:31 2001 @@ -134,6 +134,30 @@ spin_unlock_irqrestore(&ioapic_lock, flags); } +DO_ACTION( __trigger_level, 0, |= 0x00008000, io_apic_sync(entry->apic))/* mask = 1 */ +DO_ACTION( __trigger_edge, 0, &= 0xffff7fff, ) /* mask = 0 */ + + +static void unmask_level_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __trigger_level_IO_APIC_irq(irq); + __unmask_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + +static void mask_level_IO_APIC_irq (unsigned int irq) +{ + unsigned long flags; + + spin_lock_irqsave(&ioapic_lock, flags); + __mask_IO_APIC_irq(irq); + __trigger_edge_IO_APIC_irq(irq); + spin_unlock_irqrestore(&ioapic_lock, flags); +} + static void unmask_IO_APIC_irq (unsigned int irq) { unsigned long flags; @@ -143,6 +167,7 @@ spin_unlock_irqrestore(&ioapic_lock, flags); } + void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) { struct IO_APIC_route_entry entry; @@ -1181,14 +1206,14 @@ */ static unsigned int startup_level_ioapic_irq (unsigned int irq) { - unmask_IO_APIC_irq(irq); + unmask_level_IO_APIC_irq(irq); return 0; /* don't check for pending */ } -#define shutdown_level_ioapic_irq mask_IO_APIC_irq -#define enable_level_ioapic_irq unmask_IO_APIC_irq -#define disable_level_ioapic_irq mask_IO_APIC_irq +#define shutdown_level_ioapic_irq mask_level_IO_APIC_irq +#define enable_level_ioapic_irq unmask_level_IO_APIC_irq +#define disable_level_ioapic_irq mask_level_IO_APIC_irq static void end_level_ioapic_irq (unsigned int i) { ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:54 ` Manfred Spraul @ 2001-01-12 21:07 ` Frank de Lange 2001-01-12 21:31 ` Manfred Spraul 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 21:07 UTC (permalink / raw) To: Manfred Spraul; +Cc: mingo, Linus Torvalds, dwmw2, linux-kernel, Alan Cox On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote: > I have found one combination that doesn't hang with the unpatched > 8390.c, but network throughput is down to 1/2. I hope that's due to the > debugging changes. Hm, could it be that the fact that network throughput is halved causes the problem not to appear? Remember, it only appears under HEAVY network load. A single nfs cp -rd <big_dir> was not enough to hang my network, I needed to add at least another cp -rd or some streaming audio or something else... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 21:07 ` Frank de Lange @ 2001-01-12 21:31 ` Manfred Spraul 0 siblings, 0 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 21:31 UTC (permalink / raw) To: Frank de Lange; +Cc: mingo, Linus Torvalds, dwmw2, linux-kernel, Alan Cox Frank de Lange wrote: > > On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote: > > I have found one combination that doesn't hang with the unpatched > > 8390.c, but network throughput is down to 1/2. I hope that's due to the > > debugging changes. > > Hm, could it be that the fact that network throughput is halved causes the > problem not to appear? No. The problem is still there. But now lots of losts packets instead of a total hang. Due to the modification of mask_irq now disable_irq_nosync and enable_irq act as if I would press SysRQ+q every millisecond, and thus the io apic is immediatly reset when it got stuck. Btw, my initial assumption about EOI to masked interrupt must be wrong: 2.2 always first masks the irq, then it sends the EOI, and 2.2 doesn't hang. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? 2001-01-12 20:31 ` Ingo Molnar 2001-01-12 20:35 ` Frank de Lange @ 2001-01-12 23:50 ` Alan Cox 1 sibling, 0 replies; 90+ messages in thread From: Alan Cox @ 2001-01-12 23:50 UTC (permalink / raw) To: mingo Cc: Frank de Lange, Manfred Spraul, Linus Torvalds, dwmw2, linux-kernel, Alan Cox > WITH. patched 8390.c, patched apic.c, sock io_apic.c. My very strong > feeling is that this will be a stable combination, and that this is what > we want as a final solution. If you do that please #ifdef SMP all the changes. Its impossible to use a modem and an ne2K together on a typical PC otherwise. The copy from the NE2K with irq disabled is just _SO_ slow you drop bytes continually. I did all the horrible magic in the ne2k driver for a reason. The other alternative is to provide a way to force the system back out of apic mode so the ne2K driver can do a goodbye_apic_crap() type call - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 19:59 ` Linus Torvalds ` (2 preceding siblings ...) 2001-01-12 20:11 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? Manfred Spraul @ 2001-01-12 21:21 ` Frank de Lange 3 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 21:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox > Remind me: what polarity are your io-apic irq's? Level, edge, sideways? > Anything else that might be relevant? Well, sideways ofcourse! :-) here's a cat /proc/interrupts from the (BP6) box: CPU0 CPU1 0: 104936 105433 IO-APIC-edge timer 1: 4444 4384 IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3: 79 59 IO-APIC-edge serial 4: 12743 12850 IO-APIC-edge serial 14: 7855 7885 IO-APIC-edge ide0 15: 1990 1703 IO-APIC-edge ide1 16: 0 0 IO-APIC-level es1371, mga@PCI:1:0:0 17: 24 28 IO-APIC-level sym53c8xx 18: 0 0 IO-APIC-level bttv 19: 460435 460402 IO-APIC-level eth0, eth1, usb-uhci NMI: 210303 210303 LOC: 210285 210284 ERR: 0 The interrupt which caused problems was 19 (with both network cards and USB on it). It shows a high number of interrupts because I've been load-testing the network. The mere fact that it shows this hig number of interrupts shows the fix works... As this is a BP6, I'm now supposed to go on about the dead chickens, dedicated air conditioners, nuclear powersupplies and other magic you're supposed to buy to get these boards running. Well, nothing of that sort, it is running on a simple (but high quality) 235W PSU with heatgreased coolers on the CPUs and the BX xhipset. Nothing is overclocked. CPU and chipset tmeperatures are 24.C and 32.C, respectively. In short, nothing remarkable. All PCI slots are used, as you can see from my first posting in this thread (which contains more info on the hardware). //Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 19:33 ` Manfred Spraul 2001-01-12 19:52 ` Frank de Lange @ 2001-01-12 23:35 ` Alan Cox 2001-01-13 0:06 ` Manfred Spraul ` (2 more replies) 1 sibling, 3 replies; 90+ messages in thread From: Alan Cox @ 2001-01-12 23:35 UTC (permalink / raw) To: Manfred Spraul Cc: Frank de Lange, dwmw2, linux-kernel, mingo, Alan Cox, torvalds > Could you disable both bandaids? I disabled them, no problems so far. > Now back to the disable_irq_nosync(). Ok so it looks like the disable_irq code is buggy. Unfortunately its not just used for these drivers they are just the heaviest users. Given that we can see the IRQ is still set on the APIC can we fake an IRQ in that condition ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox @ 2001-01-13 0:06 ` Manfred Spraul 2001-01-13 0:36 ` Linus Torvalds 2001-01-13 2:10 ` Andrew Morton 2 siblings, 0 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-13 0:06 UTC (permalink / raw) To: Alan Cox; +Cc: Frank de Lange, dwmw2, linux-kernel, mingo, torvalds Alan Cox wrote: > > > Could you disable both bandaids? I disabled them, no problems so far. > > Now back to the disable_irq_nosync(). > > Ok so it looks like the disable_irq code is buggy. Unfortunately its not > just used for these drivers they are just the heaviest users. > > Given that we can see the IRQ is still set on the APIC can we fake an IRQ in > that condition ? Switch to edge triggered, wait , switch back to level triggered - I've added that to my sysrq commands, and it clears the IRR bit in the IO APIC. Sometimes 2 or 3 attempts are required. I tried several changes to the ioapic code, but without success. I found: * the io_apic_sync() in __mask_IO_APIC_irq() is probably wrong, I'd say it should be _after_ the io_apic_modify, but currently it is before the io_apic_modify(). 2.2 uses the same code, that doesn't explain the problem. * this comment in http://oss.sgi.com/www.linux.sgi.com/intel/visws/visws.2210.28jul99.patch +/* XXX ouch... is this really our only choice for masking this interrupt? */ +/* XXX not fully safe for 2 reasons: + * 1) should not touch an apic entry while (whole) apic is enabled + * 2) careful about storing to IRR bit (unless we know this intr is idle) + */ +static void disable_cobalt_irq(unsigned int irq) /* disable_irq() */ +{ + int ent = is_co_apic(irq); + if (ent == -1) { + return; /* could actually be a panic */ + } + + /* Note: h/w nada like read-mod-write! Vec saved in IRQ_VECTOR() */ + co_apic_write(CO_APIC_LO(ent), CO_APIC_MASK); + (void)co_apic_read(CO_APIC_LO(ent)); /* sync cpu to cobalt apic */ +} + It's possible that the comment is about an cobalt specific problem, the IRR bit is - according the Intel documentation - read only. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox 2001-01-13 0:06 ` Manfred Spraul @ 2001-01-13 0:36 ` Linus Torvalds 2001-01-13 0:48 ` Frank de Lange 2001-01-13 1:38 ` Manfred Spraul 2001-01-13 2:10 ` Andrew Morton 2 siblings, 2 replies; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 0:36 UTC (permalink / raw) To: Alan Cox; +Cc: Manfred Spraul, Frank de Lange, dwmw2, linux-kernel, mingo On Fri, 12 Jan 2001, Alan Cox wrote: > > Could you disable both bandaids? I disabled them, no problems so far. > > Now back to the disable_irq_nosync(). > > Ok so it looks like the disable_irq code is buggy. Unfortunately its not > just used for these drivers they are just the heaviest users. It may well not be disable_irq() that is buggy. In fact, there's good reason to believe that it's a hardware problem. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:36 ` Linus Torvalds @ 2001-01-13 0:48 ` Frank de Lange 2001-01-13 0:56 ` Linus Torvalds 2001-01-13 1:38 ` Manfred Spraul 1 sibling, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-13 0:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote: > It may well not be disable_irq() that is buggy. In fact, there's good > reason to believe that it's a hardware problem. I am inclined to believe it IS a hardware problem... If disable_irq were buggy, wouldn't the problem occur more frequently in other irq-heavy areas? A quick count shows that disable_irq* is used in 84 sourcefiles in the driver/* directory. This includes drivers which generate many interrupts in a short timeframe (like ide). Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:48 ` Frank de Lange @ 2001-01-13 0:56 ` Linus Torvalds 2001-01-13 1:27 ` Frank de Lange ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 0:56 UTC (permalink / raw) To: Frank de Lange; +Cc: Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo On Sat, 13 Jan 2001, Frank de Lange wrote: > On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote: > > It may well not be disable_irq() that is buggy. In fact, there's good > > reason to believe that it's a hardware problem. > > I am inclined to believe it IS a hardware problem... If disable_irq were buggy, > wouldn't the problem occur more frequently in other irq-heavy areas? A quick > count shows that disable_irq* is used in 84 sourcefiles in the driver/* > directory. This includes drivers which generate many interrupts in a short > timeframe (like ide). IDE is not my favourite example of a "known stable driver". Also, in many cases IDE is for historical reasons connected to an EDGE io-apic pin (ie it's still considered an ISA interrupt). Which probably wouldn't show this problem anyway. Also, IDE doesn't generate all that many interrupts. You can make a network driver do a _lot_ more interrupts than just about any disk driver by simply sending/receiving a lot of packets. With disks it is very hard to get the same kind of irq load - Linux will merge the requests and do at least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card, you can probably _easily_ generate a much higher stream of interrupts. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:56 ` Linus Torvalds @ 2001-01-13 1:27 ` Frank de Lange 2001-01-13 1:51 ` Manfred Spraul 2001-01-13 1:49 ` Jens Axboe ` (2 subsequent siblings) 3 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-13 1:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo On Fri, Jan 12, 2001 at 04:56:24PM -0800, Linus Torvalds wrote: > IDE is not my favourite example of a "known stable driver". Also, in many > cases IDE is for historical reasons connected to an EDGE io-apic pin (ie > it's still considered an ISA interrupt). Which probably wouldn't show this > problem anyway. They (ide interrupts) are indeed EDGE-triggered on my box. I have not enabled the HPT366 (ATA66) controller on this board, so I can not tell if that controller is EDGE-triggered as well. > Also, IDE doesn't generate all that many interrupts. You can make a > network driver do a _lot_ more interrupts than just about any disk driver > by simply sending/receiving a lot of packets. With disks it is very hard > to get the same kind of irq load - Linux will merge the requests and do at > least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card, > you can probably _easily_ generate a much higher stream of interrupts. There's sound... The msnd.c (Turtle Beach MultiSound) driver (and its derivatives, like msnd_pinnacle) uses disable_irq. Running esd (esound daemon), sound can easily generate > 1000 interrupts/second, since esd uses small dma transfers. This can be seen quite clearly from /proc/interrupts on my soundserver: CPU0 0: 276867328 XT-PIC timer 1: 2 XT-PIC keyboard 2: 0 XT-PIC cascade 3: 7631519 XT-PIC eth1 4: 2751419 XT-PIC serial 5: 1907346678 XT-PIC soundblaster 8: 1 XT-PIC rtc 9: 45022986 XT-PIC eth0 13: 1 XT-PIC fpu 14: 4320643 XT-PIC ide0 15: 4409193 XT-PIC ide1 NMI: 0 OK, this is an ageing P166, and it uses a different driver, etc. I have not found any problems with hanging sound drivers in Google query for 'linux msnd bp6' or 'linux multisound bp6'. Of course, this is no conclusive evidence, far from it... It could be that people using those cards are not the ones who tend to go for the (somewhat tricky) BP6 board... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 1:27 ` Frank de Lange @ 2001-01-13 1:51 ` Manfred Spraul 2001-01-13 2:11 ` Frank de Lange 0 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-13 1:51 UTC (permalink / raw) To: Frank de Lange; +Cc: Linus Torvalds, Alan Cox, dwmw2, linux-kernel, mingo Frank de Lange wrote: > > It could be that people using those cards are not the ones who tend > to go for the (somewhat tricky) BP6 board... > I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the IO APIC chip used for SMP 440BX board) I can't find any spec updates for that chip: either it's the first perfect chip Intel ever produced, or ... -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 1:51 ` Manfred Spraul @ 2001-01-13 2:11 ` Frank de Lange 0 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-13 2:11 UTC (permalink / raw) To: Manfred Spraul; +Cc: Linus Torvalds, Alan Cox, dwmw2, linux-kernel, mingo On Sat, Jan 13, 2001 at 02:51:54AM +0100, Manfred Spraul wrote: > Frank de Lange wrote: > > > > It could be that people using those cards are not the ones who tend > > to go for the (somewhat tricky) BP6 board... > > > > I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD > board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the > IO APIC chip used for SMP 440BX board) It isn't. But I just meant to indicate that the mere fact that I could not find any problem-report for that combination does not indicate that there ARE no problems... > I can't find any spec updates for that chip: either it's the first > perfect chip Intel ever produced, or ... :-) Well, the BX chipset is one of their better attempts I think... Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:56 ` Linus Torvalds 2001-01-13 1:27 ` Frank de Lange @ 2001-01-13 1:49 ` Jens Axboe 2001-01-13 2:12 ` Andrew Morton 2001-01-15 16:15 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Zdenek Kabelac 3 siblings, 0 replies; 90+ messages in thread From: Jens Axboe @ 2001-01-13 1:49 UTC (permalink / raw) To: Linus Torvalds Cc: Frank de Lange, Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo On Fri, Jan 12 2001, Linus Torvalds wrote: > [...] With disks it is very hard > to get the same kind of irq load - Linux will merge the requests and do at > least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card, Actually, without mult count you will do only 512b of I/O per interrupt on IDE. Regardless of merging etc. Still doesn't reach nic levels, but it's _bad_ anyway :-) -- * Jens Axboe <axboe@suse.de> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:56 ` Linus Torvalds 2001-01-13 1:27 ` Frank de Lange 2001-01-13 1:49 ` Jens Axboe @ 2001-01-13 2:12 ` Andrew Morton 2001-01-13 2:48 ` Linus Torvalds 2001-01-15 16:15 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Zdenek Kabelac 3 siblings, 1 reply; 90+ messages in thread From: Andrew Morton @ 2001-01-13 2:12 UTC (permalink / raw) To: Linus Torvalds Cc: Frank de Lange, Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo Linus Torvalds wrote: > > On Sat, 13 Jan 2001, Frank de Lange wrote: > > > On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote: > > > It may well not be disable_irq() that is buggy. In fact, there's good > > > reason to believe that it's a hardware problem. > > > > I am inclined to believe it IS a hardware problem... If disable_irq were buggy, > > wouldn't the problem occur more frequently in other irq-heavy areas? A quick > > count shows that disable_irq* is used in 84 sourcefiles in the driver/* > > directory. This includes drivers which generate many interrupts in a short > > timeframe (like ide). > > IDE is not my favourite example of a "known stable driver". Also, in many > cases IDE is for historical reasons connected to an EDGE io-apic pin (ie > it's still considered an ISA interrupt). Which probably wouldn't show this > problem anyway. > 3c59x calls disable_irq() once per minute, and seems to be one of the most-affected drivers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 2:12 ` Andrew Morton @ 2001-01-13 2:48 ` Linus Torvalds 2001-01-13 3:24 ` Andrew Morton 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 2:48 UTC (permalink / raw) To: Andrew Morton Cc: Frank de Lange, Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo On Sat, 13 Jan 2001, Andrew Morton wrote: > > 3c59x calls disable_irq() once per minute, and seems to be > one of the most-affected drivers. The ne2k thing seems to be the _most_ affected one, as far as I can tell. However, it could easily be a matter of timing - for example, if the driver does something to trigger an interrupt _just_ before (or after, considering the asynchronous nature of writing to the IO-APIC) doing the enable_irq(), then.. I'm also nervous about the complete lack of locking in vortex_timer(): disabling interrupts doesn't mean that transmits couldn't be pending. But maybe the hardware is ok with changing status concurrently. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 2:48 ` Linus Torvalds @ 2001-01-13 3:24 ` Andrew Morton 2001-01-13 12:37 ` Russell King 0 siblings, 1 reply; 90+ messages in thread From: Andrew Morton @ 2001-01-13 3:24 UTC (permalink / raw) To: Linus Torvalds Cc: Frank de Lange, Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo Linus Torvalds wrote: > > I'm also nervous about the complete lack of locking in vortex_timer(): > disabling interrupts doesn't mean that transmits couldn't be > pending. But maybe the hardware is ok with changing status concurrently. > mm.. It's a little racy wrt vortex_ioctl(), but otherwise OK. del_timer_sync() in vortex_ioctl() seems to be needed. disable_irq() is very useful in functions such as this. It would be a shame to have to stop using it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 3:24 ` Andrew Morton @ 2001-01-13 12:37 ` Russell King 2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul 0 siblings, 1 reply; 90+ messages in thread From: Russell King @ 2001-01-13 12:37 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Frank de Lange, Alan Cox, Manfred Spraul, dwmw2, linux-kernel, mingo Andrew Morton writes: > Linus Torvalds wrote: > > I'm also nervous about the complete lack of locking in vortex_timer(): > > disabling interrupts doesn't mean that transmits couldn't be > > pending. But maybe the hardware is ok with changing status concurrently. > > disable_irq() is very useful in functions such as this. It > would be a shame to have to stop using it. Doesn't the NCR53C9x SCSI drivers use disable_irq() a lot? Do they have any problems? _____ |_____| ------------------------------------------------- ---+---+- | | Russell King rmk@arm.linux.org.uk --- --- | | | | http://www.arm.linux.org.uk/personal/aboutme.html / / | | +-+-+ --- -+- / | THE developer of ARM Linux |+| /|\ / | | | --- | +-+-+ ------------------------------------------------- /\\\ | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) 2001-01-13 12:37 ` Russell King @ 2001-01-13 15:18 ` Manfred Spraul 2001-01-13 23:55 ` Manfred Spraul ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-13 15:18 UTC (permalink / raw) To: Russell King Cc: Andrew Morton, Linus Torvalds, Frank de Lange, Alan Cox, dwmw2, linux-kernel, mingo [-- Attachment #1: Type: text/plain, Size: 1057 bytes --] Russell King wrote: > > Doesn't the NCR53C9x SCSI drivers use disable_irq() a lot? Do they have > any problems? > It seems that a certain timing is necessary: one flood ping or a single ncp usually doesn't trigger any problems, but 2 concurrent flood pings hang the network after 5-10 seconds. It's not multi processor specific, both I and Frank can trigger it when we boot with one cpu. So far it seems that only the io apic for the BX chipset is affected (only SMP BX boards contain an io apic). Any volunteers with ne2k-pci cards and other motherboards that include an io apic (e.g. all Intel motherboards that use an IO Controller Hub, Via Apollo Pro133, Pro133A, KX133)? Please: * apply the attached patch. * compile the kernel for SMP, or at least enable uniprocessor io apic support. * reboot. * flood ping the computer with 2 concurrent flood pings from a second computer. * wait one minute. According to the ICH2 documentation the IRR bit on the IO APIC is writable - that's either a docu error, or could cause further problems. -- Manfred [-- Attachment #2: patch-focus --] [-- Type: text/plain, Size: 316 bytes --] --- linux/arch/i386/kernel/apic.c Tue Dec 5 21:43:48 2000 +++ linux/arch/i386/kernel/apic.c.new Sat Jan 13 15:54:56 2001 @@ -270,7 +270,7 @@ * PCI Ne2000 networking cards and PII/PIII processors, dual * BX chipset. ] */ -#if 0 +#if 1 /* Enable focus processor (bit==0) */ value &= ~(1<<9); #else ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) 2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul @ 2001-01-13 23:55 ` Manfred Spraul 2001-01-14 0:18 ` Call for testers: ne2k-pci and io apic J . A . Magallon ` (2 subsequent siblings) 3 siblings, 0 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-13 23:55 UTC (permalink / raw) To: linux-kernel It seems that noone uses a Ne2000 compatible pci NIC with a newer motherboard (every K7 board, Intel 8xx boards, via apollo pro 133), but I've set up a tiny web site that describes my problem: colorfullife.com/~manfred/io_apic -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Call for testers: ne2k-pci and io apic 2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul 2001-01-13 23:55 ` Manfred Spraul @ 2001-01-14 0:18 ` J . A . Magallon 2001-01-14 0:23 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) J . A . Magallon 2001-01-14 2:14 ` Call for testers: ne2k-pci and io apic J . A . Magallon 3 siblings, 0 replies; 90+ messages in thread From: J . A . Magallon @ 2001-01-14 0:18 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel On 2001.01.13 Manfred Spraul wrote: > > Any volunteers with ne2k-pci cards and other motherboards that include > an io apic (e.g. all Intel motherboards that use an IO Controller Hub, > Via Apollo Pro133, Pro133A, KX133)? > > Please: > * apply the attached patch. > * compile the kernel for SMP, or at least enable uniprocessor io apic > support. > * reboot.. > * flood ping the computer with 2 concurrent flood pings from a second > computer. > * wait one minute. > Volunteer. Mine is a Realtek 8029: 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) Subsystem: Realtek Semiconductor Co., Ltd. RT8029(AS) Flags: medium devsel, IRQ 11 I/O ports at ef40 [size=32] Board: SuperMicro P6DGU (440GX,PIIX4), 256Mb, 2xPII@400(Deschutes,512Kb). The only problem is that I just have a cable connection. Is it enough a 'loopback' connection (ie, ping myself) ? Does it goes at least until the card and generates interrupts or stops at the kernel soft level ? Anyways, if a 128K connection can generate enough flood for you, i'll send you my results. I am going to test it anyways... dmesg selected info: I/O APIC #2 Version 17 at 0xFEC00000. Int: type 3, pol 0, trig 0, bus 2, IRQ 00, APIC ID 2, APIC INT 00 Int: type 0, pol 0, trig 0, bus 2, IRQ 01, APIC ID 2, APIC INT 01 Int: type 0, pol 0, trig 0, bus 2, IRQ 00, APIC ID 2, APIC INT 02 Int: type 0, pol 0, trig 0, bus 2, IRQ 03, APIC ID 2, APIC INT 03 Int: type 0, pol 0, trig 0, bus 2, IRQ 04, APIC ID 2, APIC INT 04 Int: type 0, pol 0, trig 0, bus 2, IRQ 06, APIC ID 2, APIC INT 06 Int: type 0, pol 0, trig 0, bus 2, IRQ 07, APIC ID 2, APIC INT 07 Int: type 0, pol 1, trig 1, bus 2, IRQ 08, APIC ID 2, APIC INT 08 Int: type 0, pol 0, trig 0, bus 2, IRQ 09, APIC ID 2, APIC INT 09 Int: type 0, pol 0, trig 0, bus 2, IRQ 0c, APIC ID 2, APIC INT 0c Int: type 0, pol 0, trig 0, bus 2, IRQ 0d, APIC ID 2, APIC INT 0d Int: type 0, pol 0, trig 0, bus 2, IRQ 0e, APIC ID 2, APIC INT 0e Int: type 0, pol 0, trig 0, bus 2, IRQ 0f, APIC ID 2, APIC INT 0f Int: type 0, pol 3, trig 3, bus 2, IRQ 0a, APIC ID 2, APIC INT 10 Int: type 0, pol 3, trig 3, bus 2, IRQ 0b, APIC ID 2, APIC INT 11 Int: type 0, pol 3, trig 3, bus 2, IRQ 0b, APIC ID 2, APIC INT 12 Int: type 0, pol 3, trig 3, bus 2, IRQ 05, APIC ID 2, APIC INT 13 Int: type 2, pol 0, trig 0, bus 2, IRQ 00, APIC ID 2, APIC INT 17 Lint: type 3, pol 0, trig 0, bus 0, IRQ 00, APIC ID ff, APIC LINT 00 Lint: type 1, pol 0, trig 0, bus 0, IRQ 00, APIC ID ff, APIC LINT 01 Processors: 2 mapped APIC to ffffe000 (fee00000) mapped IOAPIC to ffffd000 (fec00000) .. ENABLING IO-APIC IRQs ..changing IO-APIC physical APIC ID to 2 ... ok. Synchronizing Arb IDs. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-5, 2-10, 2-11, 2-20, 2-21, 2-22, 2-23 not connected . .TIMER: vector=49 pin1=2 pin2=0 activating NMI Watchdog ... done. testing NMI watchdog ... OK. number of MP IRQ sources: 18. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... ... register #00: 02000000 ...... : physical APIC id: 02 ... register #01: 00170011 ...... : max redirection entries: 0017 ...... : IO APIC version: 0011 ... register #02: 00000000 ...... : arbitration: 00 ... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 003 03 0 0 0 0 0 1 1 39 02 003 03 0 0 0 0 0 1 1 31 03 003 03 0 0 0 0 0 1 1 41 04 003 03 0 0 0 0 0 1 1 49 05 000 00 1 0 0 0 0 0 0 00 06 003 03 0 0 0 0 0 1 1 51 07 003 03 0 0 0 0 0 1 1 59 08 003 03 0 0 0 0 0 1 1 61 09 003 03 0 0 0 0 0 1 1 69 0a 000 00 1 0 0 0 0 0 0 00 0b 000 00 1 0 0 0 0 0 0 00 0c 003 03 0 0 0 0 0 1 1 71 0d 000 00 1 0 0 0 0 0 0 00 0e 003 03 0 0 0 0 0 1 1 79 0f 003 03 0 0 0 0 0 1 1 81 10 003 03 1 1 0 1 0 1 1 89 11 003 03 1 1 0 1 0 1 1 91 12 003 03 1 1 0 1 0 1 1 91 13 003 03 1 1 0 1 0 1 1 99 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 2 IRQ1 -> 1 IRQ3 -> 3 IRQ4 -> 4 IRQ5 -> 19 IRQ6 -> 6 IRQ7 -> 7 IRQ8 -> 8 IRQ9 -> 9 IRQ10 -> 16 IRQ11 -> 17-> 18 IRQ12 -> 12 IRQ13 -> 13 IRQ14 -> 14 IRQ15 -> 15 ................................... done. calibrating APIC timer ... .... CPU clock speed is 400.9147 MHz. .... host bus clock speed is 100.2284 MHz. cpu: 0, clocks: 1002284, slice: 334094 CPU0<T0:1002272,T1:668176,D:2,S:334094,C:1002284> cpu: 1, clocks: 1002284, slice: 334094 CPU1<T0:1002272,T1:334080,D:4,S:334094,C:1002284> checking TSC synchronization across CPUs: passed. Setting commenced=1, go go go .. Board Vendor: Supermicro Inc.. Board Name: Intel 440BX/440GX. -- J.A. Magallon $> cd pub mailto:jamagallon@able.es $> more beer Linux werewolf 2.4.0-ac8 #1 SMP Fri Jan 12 18:02:50 CET 2001 i686 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) 2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul 2001-01-13 23:55 ` Manfred Spraul 2001-01-14 0:18 ` Call for testers: ne2k-pci and io apic J . A . Magallon @ 2001-01-14 0:23 ` J . A . Magallon 2001-01-14 2:14 ` Call for testers: ne2k-pci and io apic J . A . Magallon 3 siblings, 0 replies; 90+ messages in thread From: J . A . Magallon @ 2001-01-14 0:23 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel On 2001.01.13 Manfred Spraul wrote: > > Please: > * apply the attached patch. > -- > Manfred > --- linux/arch/i386/kernel/apic.c Tue Dec 5 21:43:48 2000 > +++ linux/arch/i386/kernel/apic.c.new Sat Jan 13 15:54:56 2001 > @@ -270,7 +270,7 @@ > * PCI Ne2000 networking cards and PII/PIII processors, dual > * BX chipset. ] > */ > -#if 0 > +#if 1 > /* Enable focus processor (bit==0) */ > value &= ~(1<<9); > #else > In my 2.4.0-ac9, that code goes to line 315 and looks like: * BX chipset. ] */ #if 0 /* Enable focus processor (bit==0) */ value &= ~APIC_SPIV_FOCUS_DISABLED; #else /* Disable focus processor (bit==1) */ value |= APIC_SPIV_FOCUS_DISABLED; #endif /* * Set spurious IRQ vector -- J.A. Magallon $> cd pub mailto:jamagallon@able.es $> more beer Linux werewolf 2.4.0-ac8 #1 SMP Fri Jan 12 18:02:50 CET 2001 i686 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Call for testers: ne2k-pci and io apic 2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul ` (2 preceding siblings ...) 2001-01-14 0:23 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) J . A . Magallon @ 2001-01-14 2:14 ` J . A . Magallon 3 siblings, 0 replies; 90+ messages in thread From: J . A . Magallon @ 2001-01-14 2:14 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel On 2001.01.13 Manfred Spraul wrote: > > Any volunteers with ne2k-pci cards and other motherboards that include > an io apic (e.g. all Intel motherboards that use an IO Controller Hub, > Via Apollo Pro133, Pro133A, KX133)? > In my case, (440GX/BX, PIIX4), network goes off (both with a ping-flood or some web browsing) with a message: Jan 14 03:01:25 werewolf kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 14 03:01:57 werewolf last message repeated 19 times -- J.A. Magallon $> cd pub mailto:jamagallon@able.es $> more beer Linux werewolf 2.4.0-ac9 #2 SMP Sun Jan 14 01:46:07 CET 2001 i686 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:56 ` Linus Torvalds ` (2 preceding siblings ...) 2001-01-13 2:12 ` Andrew Morton @ 2001-01-15 16:15 ` Zdenek Kabelac 3 siblings, 0 replies; 90+ messages in thread From: Zdenek Kabelac @ 2001-01-15 16:15 UTC (permalink / raw) To: Linus Torvalds Linus Torvalds wrote: > > On Sat, 13 Jan 2001, Frank de Lange wrote: > > IDE is not my favourite example of a "known stable driver". Also, in many > cases IDE is for historical reasons connected to an EDGE io-apic pin (ie > it's still considered an ISA interrupt). Which probably wouldn't show this > problem anyway. I've been having similar problem with my network card 3c59x. My BP6 was hanging with 50% probablity while running netscape from nfs mounted /usr partition). For 2.2 kernel I've created this patch which has solved this problem (so I don't have any problems until recentely I've started to use UDMA4 with ATA66 - but this seems to be related to BX chipset problem and as far as I know Andre is working on this issue) While using this patch - I've got many messages about irq_enter in my kernel log, but the system was stable and running quite happy. With the latest 2.4 kernels I do not have same problems (at least it looks so far) (So I assume the problem was hidden somewhere inside NFS). However with 2.4.0 & ac patches I'm seeing some problem when one CPU is locked and the computer becomes unusable without any reason - there is no network trafick no ide intensive actions - all I can do is to move mouse within Xfree (so maybe its xfree bug - but its hard to recognize this - all I could say is that I'm not seeing such problems with -test12 kernels) Here the patch for 2.2 I've been using: --- linux.orig/arch/i386/kernel/irq.h Wed May 31 11:21:04 2000 +++ linux/arch/i386/kernel/irq.h Wed May 31 11:21:47 2000 @@ -138,8 +138,22 @@ static inline void irq_enter(int cpu, unsigned int irq) { hardirq_enter(cpu); - while (test_bit(0,&global_irq_lock)) { - /* nothing */; + if (global_irq_holder == cpu && test_bit(0, &global_irq_lock)) { + printk(KERN_WARNING "irq_enter - CPU:%d already holder (count:% + cpu, global_irq_count, local_irq_count[cpu]); + /* avoid deadlock bellow */ + clear_bit(0,&global_irq_lock); + } else { + unsigned long i = 1000000; + while (test_bit(0,&global_irq_lock) && i) { + i--; + /* nothing */; + } + if (!i) { + clear_bit(0,&global_irq_lock); + printk(KERN_WARNING "irq_enter - loop timeout CPU:%d Holder:%d! + cpu, global_irq_holder); + } } } printk(KERN_WARNING "irq_enter - loop timeout CPU:%d Holder:%d! -- There are three types of people in the world: those who can count, and those who can't. Zdenek Kabelac http://i.am/kabi/ kabi@i.am {debian.org; fi.muni.cz} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:36 ` Linus Torvalds 2001-01-13 0:48 ` Frank de Lange @ 2001-01-13 1:38 ` Manfred Spraul 1 sibling, 0 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-13 1:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, Frank de Lange, dwmw2, linux-kernel, mingo Linus Torvalds wrote: > > It may well not be disable_irq() that is buggy. In fact, there's good > reason to believe that it's a hardware problem. > Perhaps a problem with the 82093AA external IO APIC used for 440BX board? I haven't seen any reports from newer Intel boards (the ICH2 includes an IO APIC) or from Via boards. The problem is not SMP specific: Frank wrote that it also occured when he booted with "max_cpus=1". -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox 2001-01-13 0:06 ` Manfred Spraul 2001-01-13 0:36 ` Linus Torvalds @ 2001-01-13 2:10 ` Andrew Morton 2 siblings, 0 replies; 90+ messages in thread From: Andrew Morton @ 2001-01-13 2:10 UTC (permalink / raw) To: Alan Cox Cc: Manfred Spraul, Frank de Lange, dwmw2, linux-kernel, mingo, torvalds, Donald Becker Alan Cox wrote: > > > Could you disable both bandaids? I disabled them, no problems so far. > > Now back to the disable_irq_nosync(). > > Ok so it looks like the disable_irq code is buggy. Unfortunately its not > just used for these drivers they are just the heaviest users. > > Given that we can see the IRQ is still set on the APIC can we fake an IRQ in > that condition ? As you think about this problem, please bear in mind that this loss of APIC interrupts also occurs in 2.2 kernels. Donald may have more details. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 17:16 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Manfred Spraul 2001-01-12 17:33 ` Frank de Lange @ 2001-01-12 17:49 ` Alan Cox 2001-01-12 18:08 ` Manfred Spraul 2001-01-12 22:03 ` Latest status of IDE patches from Andre Jeff Nguyen 1 sibling, 2 replies; 90+ messages in thread From: Alan Cox @ 2001-01-12 17:49 UTC (permalink / raw) To: Manfred Spraul; +Cc: dwmw2, linux-kernel, frank > Frank, could you try what happens with the NMI oopser disabled? > > The second major difference I'm immediately aware of is the number of > the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority, > 2.4 the highest priority. Im trying to remember what they were, but some APIC versions do have errata and someting about 3 irqs at the same priority level rings a bell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 17:49 ` Alan Cox @ 2001-01-12 18:08 ` Manfred Spraul 2001-01-12 18:16 ` Ingo Molnar ` (2 more replies) 2001-01-12 22:03 ` Latest status of IDE patches from Andre Jeff Nguyen 1 sibling, 3 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 18:08 UTC (permalink / raw) To: Alan Cox; +Cc: dwmw2, linux-kernel, frank Alan Cox wrote: > > > Frank, could you try what happens with the NMI oopser disabled? > > > > The second major difference I'm immediately aware of is the number of > > the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority, > > 2.4 the highest priority. > > Im trying to remember what they were, but some APIC versions do have errata > and someting about 3 irqs at the same priority level rings a bell. The PPro local apic documentation says: <<<<<<< The processor's local APIC includes an in-service entry and a holding entry for each priority level. To avoid losing interrupts, software should allocate no more than 2 interrupt vectors per priority. >>>>>>>> Ok, we must reorder the vector numbers for our own interrupts (0xfb-0xff), but that doesn't explain our problems: we don't loose reschedule interrupts, we have problems with normal interrupts - and there we only use 2 irq at the same priority level. Btw, the kick patch I sent a few minutes ago revives my io apic. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:08 ` Manfred Spraul @ 2001-01-12 18:16 ` Ingo Molnar 2001-01-12 18:45 ` Manfred Spraul 2001-01-12 18:28 ` Linus Torvalds 2001-01-12 19:05 ` Frank de Lange 2 siblings, 1 reply; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 18:16 UTC (permalink / raw) To: Manfred Spraul; +Cc: Alan Cox, dwmw2, Linux Kernel List, frank On Fri, 12 Jan 2001, Manfred Spraul wrote: > The PPro local apic documentation says: > <<<<<<< > The processor's local APIC includes an in-service entry and a holding > entry for each priority level. To avoid losing interrupts, software > should allocate no more than 2 interrupt vectors per priority. > >>>>>>>> > > Ok, we must reorder the vector numbers for our own interrupts > (0xfb-0xff), but that doesn't explain our problems: we don't loose > reschedule interrupts, we have problems with normal interrupts - and > there we only use 2 irq at the same priority level. we *already* reorder vector numbers and spread them out as much as possible. We do this in 2.2 as well. We did this almost from day 1 of IO-APIC support. If any manually allocated IRQ vector creates a '3 vectors in the same 16-vector region' situation then thats a bug in hw_irq.h.. the 'loss of interrupts' above does not include external interrupts, only local interrupts (such as the APIC timer interrupt) can get lost in such a situation. (nevertheless there is something going on.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:16 ` Ingo Molnar @ 2001-01-12 18:45 ` Manfred Spraul 2001-01-12 18:48 ` Ingo Molnar 0 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 18:45 UTC (permalink / raw) To: mingo; +Cc: Alan Cox, dwmw2, Linux Kernel List, frank Ingo Molnar wrote: > > we *already* reorder vector numbers and spread them out as much as > possible. We do this in 2.2 as well. We did this almost from day 1 of > IO-APIC support. If any manually allocated IRQ vector creates a '3 vectors > in the same 16-vector region' situation then thats a bug in hw_irq.h.. > I reread the 2.2 and 2.4 code: 2.2 spreads all vectors, even the internal interrupts (tlb flush, reschedule, timer, call function) are spread. Hmm. AFAICS there is one bug: 3 interrupts are in priority 5: CALL_FUNCTION_VECTOR 0x50 IRQ0_TRAP_VECTOR 0x51 irq1: irq0+8 0x59 check arch/i386/kernel/irq.h and assign_irq_vector (io_apic.c) 2.4 spreads the vectors for the external (hardware, from io apic) interrupts, but 5 ipi vectors have the same priority: reschedule, call function, tlb invalidate, apic error, spurious interrupt. But that doesn't explain what happens with ne2k cards: neither 2.2 nor 2.4 have more than 2 interrupts in class for the hardware interrupt 16/19. With 2.2 I don't have any problems, 2.4 hangs. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:45 ` Manfred Spraul @ 2001-01-12 18:48 ` Ingo Molnar 0 siblings, 0 replies; 90+ messages in thread From: Ingo Molnar @ 2001-01-12 18:48 UTC (permalink / raw) To: Manfred Spraul; +Cc: Alan Cox, dwmw2, Linux Kernel List, frank On Fri, 12 Jan 2001, Manfred Spraul wrote: > 2.4 spreads the vectors for the external (hardware, from io apic) > interrupts, but 5 ipi vectors have the same priority: reschedule, call > function, tlb invalidate, apic error, spurious interrupt. my reading of the errata is that the lost APIC timer IRQ happens only if the APIC timer IRQ vector's priority level has more than 2 active vectors. It's a very limited case, which does not happen in recent CPUs anyway (such as the PIII). > But that doesn't explain what happens with ne2k cards: neither 2.2 nor > 2.4 have more than 2 interrupts in class for the hardware interrupt > 16/19. yep. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:08 ` Manfred Spraul 2001-01-12 18:16 ` Ingo Molnar @ 2001-01-12 18:28 ` Linus Torvalds 2001-01-12 23:27 ` Alan Cox 2001-01-12 19:05 ` Frank de Lange 2 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2001-01-12 18:28 UTC (permalink / raw) To: linux-kernel In article <3A5F4827.2E443786@colorfullife.com>, Manfred Spraul <manfred@colorfullife.com> wrote: >The processor's local APIC includes an in-service entry and a holding >entry for each priority level. To avoid losing interrupts, software >should allocate no more than 2 interrupt vectors per priority. >>>>>>>>> > >Ok, we must reorder the vector numbers for our own interrupts >(0xfb-0xff), but that doesn't explain our problems: we don't loose >reschedule interrupts, we have problems with normal interrupts - and >there we only use 2 irq at the same priority level. > >Btw, the kick patch I sent a few minutes ago revives my io apic. Does this seem to happen mainly with drivers that use "disable_irq()" and "enable_irq()"? I know the ne drivers do (through the 8390 module), and some others do too (3c59x). "disable_irq()"/"enable_irq()" has always tended to be slightly problematic. It's not a set of semantics that maps well onto all interrupt controllers (io-apic definitely included). Drivers would generally be better off if they disabled their own chip from sending interrupts, rather than disabling the interrupt line the chip is on. (Of course, most drivers would be even _better_ off if they didn't play games with irq disabling at all, but I think the 8390 driver does it because otherwise it would suck too badly for words). If you are seeing this with a 8390 core, try to see if the problem goes away if you remove the "disable_irq_nosync(dev->irq);" and "enable_irq()" thing (which means that you need to change the spinlocking at the same place to use irq-safe versions - this _will_ make for bad interrupt latency especially with ISA ne2000 cards, but it would be interesting to hear if it makes the problem less likely to happen). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:28 ` Linus Torvalds @ 2001-01-12 23:27 ` Alan Cox 2001-01-13 0:35 ` Linus Torvalds 0 siblings, 1 reply; 90+ messages in thread From: Alan Cox @ 2001-01-12 23:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel > interrupt controllers (io-apic definitely included). Drivers would > generally be better off if they disabled their own chip from sending > interrupts, rather than disabling the interrupt line the chip is on. That doesn't work very well because the device irq can arrive a measurable number of clocks after you disable it on the source and there is no way to say 'and be sure the stupid thing has propogated the apic bus' Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 23:27 ` Alan Cox @ 2001-01-13 0:35 ` Linus Torvalds 2001-01-13 0:43 ` Alan Cox 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 0:35 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Fri, 12 Jan 2001, Alan Cox wrote: > > interrupt controllers (io-apic definitely included). Drivers would > > generally be better off if they disabled their own chip from sending > > interrupts, rather than disabling the interrupt line the chip is on. > > That doesn't work very well because the device irq can arrive a measurable > number of clocks after you disable it on the source and there is no way to > say 'and be sure the stupid thing has propogated the apic bus' I agree. Asynchronous buses are nasty that way. However, if your "locking" is based on device state, you can still do it: you just have to have an internal device protocol. The simplest example of this is: interrupt_handler() { status = readl(dev->status); if (status & MY_IRQ_DISABLE) return; } { status = readl(dev->status); writel(dev->status, status | MY_IRQ_DISABLE); spin_lock(); .. critical region .. spin_unlock(); writel(dev->status, status); } Notice how the above does NOT guarantee that the physical interrupt might not be "in flight". It does, however, synchronize other ways, simply by virtue of using the _synchronous_ bus (PCI, in this case) to figure out when the asynchronous bus (interrupts) has a bogus event. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:35 ` Linus Torvalds @ 2001-01-13 0:43 ` Alan Cox 2001-01-13 0:48 ` Linus Torvalds 0 siblings, 1 reply; 90+ messages in thread From: Alan Cox @ 2001-01-13 0:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, linux-kernel > interrupt_handler() > { > status = readl(dev->status); > if (status & MY_IRQ_DISABLE) > return; Unfortunately on the 8390 the IRQ statud register is on page 0. The code on the other CPU might not be on page 0. That means we can't even safely check if there is an irq pending or clear it down (bad news on ne2k-pci) without getting that lock. That means we have to be able to just block that one irq source to avoid horrible SMP latency problems. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-13 0:43 ` Alan Cox @ 2001-01-13 0:48 ` Linus Torvalds 0 siblings, 0 replies; 90+ messages in thread From: Linus Torvalds @ 2001-01-13 0:48 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Sat, 13 Jan 2001, Alan Cox wrote: > > interrupt_handler() > > { > > status = readl(dev->status); > > if (status & MY_IRQ_DISABLE) > > return; > > Unfortunately on the 8390 the IRQ statud register is on page 0. The code > on the other CPU might not be on page 0. That means we can't even safely > check if there is an irq pending or clear it down (bad news on ne2k-pci) > without getting that lock. Alan. THINK. The "synchronous channel" can be _memory_. I just used the above as an example of using a synchronous channel to make sure that the asynchronous buses aren't screwing us. You only need one bit of synchronous information. The example used the very same bit that the irq flag on the device is anyway - which works on a lot of devices simply because they need to read the status flags anyway in order to handle the interrupt. But if you have problems with that, just use an atomic flag off "dev->private" instead. Big deal. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 18:08 ` Manfred Spraul 2001-01-12 18:16 ` Ingo Molnar 2001-01-12 18:28 ` Linus Torvalds @ 2001-01-12 19:05 ` Frank de Lange 2001-01-12 20:04 ` Linus Torvalds 2 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 19:05 UTC (permalink / raw) To: Manfred Spraul; +Cc: Alan Cox, dwmw2, linux-kernel As per Linus' suggestion, I removed the disable_irq/enable_irq statements from the 8390 core driver, and replace the spinlocks with irq-safe versions. This seems to solve the network hangs, as I am currently running a heavy network load (which would have killed a non-patched driver within seconds). Network latency seems a bit higher, and there are some hiccups in the streaming audio (part of the network load, easy indicator of performance...), but no hangs. Here's the patch: --- linux/drivers/net/8390.c.org Fri Jan 12 19:52:38 2001 +++ linux/drivers/net/8390.c Fri Jan 12 19:54:50 2001 @@ -242,15 +242,15 @@ /* Ugly but a reset can be slow, yet must be protected */ - disable_irq_nosync(dev->irq); - spin_lock(&ei_local->page_lock); + /* disable_irq_nosync(dev->irq); */ + spin_lock_irq(&ei_local->page_lock); /* Try to restart the card. Perhaps the user has fixed something. */ ei_reset_8390(dev); NS8390_init(dev, 1); - spin_unlock(&ei_local->page_lock); - enable_irq(dev->irq); + spin_unlock_irq(&ei_local->page_lock); + /* enable_irq(dev->irq); */ netif_wake_queue(dev); } @@ -285,9 +285,9 @@ * Slow phase with lock held. */ - disable_irq_nosync(dev->irq); + /* disable_irq_nosync(dev->irq); */ - spin_lock(&ei_local->page_lock); + spin_lock_irq(&ei_local->page_lock); ei_local->irqlock = 1; @@ -383,8 +383,8 @@ ei_local->irqlock = 0; outb_p(ENISR_ALL, e8390_base + EN0_IMR); - spin_unlock(&ei_local->page_lock); - enable_irq(dev->irq); + spin_unlock_irq(&ei_local->page_lock); + /* enable_irq(dev->irq); */ dev_kfree_skb (skb); ei_local->stat.tx_bytes += send_length; -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 19:05 ` Frank de Lange @ 2001-01-12 20:04 ` Linus Torvalds 2001-01-15 14:36 ` Roeland Th. Jansen 0 siblings, 1 reply; 90+ messages in thread From: Linus Torvalds @ 2001-01-12 20:04 UTC (permalink / raw) To: linux-kernel In article <20010112200541.A25675@unternet.org>, Frank de Lange <frank@unternet.org> wrote: >As per Linus' suggestion, I removed the disable_irq/enable_irq statements from >the 8390 core driver, and replace the spinlocks with irq-safe versions. This >seems to solve the network hangs, as I am currently running a heavy network >load (which would have killed a non-patched driver within seconds). Network >latency seems a bit higher, and there are some hiccups in the streaming audio >(part of the network load, easy indicator of performance...), but no hangs. Ok, so it's tentatively the IOAPIC disable/enable code. But it could obviously be something that just interacts with it, including just a timing issue (ie the _real_ bug might just be bad behaviour when changing IO-APIC state at the same time as an interrupt happens, and disable/enable-irq just happen to be the only things that do it at a high enough frequency that you can see the problem). Remind me: what polarity are your io-apic irq's? Level, edge, sideways? Anything else that might be relevant? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware 2001-01-12 20:04 ` Linus Torvalds @ 2001-01-15 14:36 ` Roeland Th. Jansen 0 siblings, 0 replies; 90+ messages in thread From: Roeland Th. Jansen @ 2001-01-15 14:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Fri, Jan 12, 2001 at 12:04:21PM -0800, Linus Torvalds wrote: > Ok, so it's tentatively the IOAPIC disable/enable code. But it could > obviously be something that just interacts with it, including just a > timing issue (ie the _real_ bug might just be bad behaviour when > changing IO-APIC state at the same time as an interrupt happens, and > disable/enable-irq just happen to be the only things that do it at a > high enough frequency that you can see the problem). my BP6 with the patch frank sent me and the apic code at line 273 (or so) defined as '1' and a flood ping : Jan 14 19:56:19 grobbebol kernel: APIC error on CPU1: 02(02) Jan 14 19:56:25 grobbebol kernel: APIC error on CPU1: 02(02) Jan 14 19:58:10 grobbebol last message repeated 2 times Jan 14 20:00:01 grobbebol kernel: APIC error on CPU1: 02(02) Jan 14 20:01:11 grobbebol last message repeated 2 times Jan 14 20:01:48 grobbebol kernel: APIC error on CPU1: 02(02) Jan 14 20:01:59 grobbebol kernel: APIC error on CPU1: 02(08) Jan 14 20:02:10 grobbebol kernel: APIC error on CPU1: 08(08) Jan 14 20:02:39 grobbebol kernel: APIC error on CPU1: 08(02) Jan 14 20:02:39 grobbebol kernel: unexpected IRQ trap at vector 8d Jan 14 20:15:32 grobbebol kernel: APIC error on CPU1: 02(08) [....] ad the network is dead. however, no crashes seen during this. -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB | on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Latest status of IDE patches from Andre 2001-01-12 17:49 ` Alan Cox 2001-01-12 18:08 ` Manfred Spraul @ 2001-01-12 22:03 ` Jeff Nguyen 1 sibling, 0 replies; 90+ messages in thread From: Jeff Nguyen @ 2001-01-12 22:03 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Hi Alan, Will you incorporate Andre IDE patches into the 2.2.19preXX kernel? If not, do you have any plan to do so in the very near future? I know this issue has been discussed before. But there has not been any progress. I have been following the kernel development for years. I have seen many drivers added to the kernel. Some weren't working properly and had to be removed quickly. Strangely, the IDE patches drag on and on after all the kernel releases. Are you still concern about the stability of the driver? Regards, Jeff ASL Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <20010112165104.A22465@unternet.org>]
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? [not found] <20010112165104.A22465@unternet.org> @ 2001-01-16 19:23 ` Maciej W. Rozycki 0 siblings, 0 replies; 90+ messages in thread From: Maciej W. Rozycki @ 2001-01-16 19:23 UTC (permalink / raw) To: Frank de Lange; +Cc: Andrew Morton, linux-kernel On Fri, 12 Jan 2001, Frank de Lange wrote: [I've cut syslog junk away for clarity -- you could just do `dmesg -s 32768'.] > before network hang > =================== [...] > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: [...] > 13 0FF 0F 0 1 0 1 0 1 1 99 [...] > printing local APIC contents on CPU#0/0: [...] > ... APIC TMR field: > 0123456789abcdef0123456789abcdef [...] > 00000000010000000000000001000000 ^ [...] > printing local APIC contents on CPU#1/1: [...] > ... APIC TMR field: > 0123456789abcdef0123456789abcdef [...] > 00000000000000000000000001000000 ^ [...] Here everything is fine. Vector 0x99 (the one you are having troubles with) is set up as level-triggered (Trig is 1) and the respective bits of the Trigger Mode Register (TMR) of the local APIC of both CPUs are set, i.e. the last 0x99 IRQ processed was level-triggered as expected. As a part of the ususal inter-APIC handshake for level-triggered interrupts an EOI message was sent to the originating I/O APIC (IRR is 0) to inform it, it's free to send the IRQ again if still asserted. > after network hang > ================== [...] > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: [...] > 13 0FF 0F 0 1 1 1 0 1 1 99 [...] > printing local APIC contents on CPU#1/1: [...] > ... APIC TMR field: > 0123456789abcdef0123456789abcdef [...] > 00000000000000000000000001000000 ^ > printing local APIC contents on CPU#0/0: [...] > ... APIC TMR field: > 0123456789abcdef0123456789abcdef [...] > 00000000010000000000000000000000 ^ Gotcha! [...] Here the last 0x99 IRQ delivered to CPU#1 was fine, just like before. But the last 0x99 IRQ CPU#0 received was apparently delivered as edge-triggered -- the respective bits of the Trigger Mode Register (TMR) of the local APIC is cleared. Hence the local APIC decided no EOI message is needed for the originating I/O APIC as edge-triggered interrupts are always sent by an I/O APIC whenever arriving (it's not possible for level-triggered ones as an IRQ storm would result). Upon receiving an EOI command from Linux the local APIC decides everything is finished and the I/O APIC is left stuck with the IRR bit set to 1. It's still waiting for an EOI message to arrive for further 0x99 IRQs to send. How could it happen? Well, I guess a transmission error could have happened that remained unnoticed by the checksumming hardware. As the checksum algorithm is pretty trivial -- a cumulative sum of 2-bit values -- it might just have happened two bits got toggled. I believe such errors are happening due to marginal hardware -- not every i386 SMP box shows this problem, even if a high volume of level-triggered interrupts is observed. Thank you very much for the log. I already have an idea how to automatically recover from such a situation. No driver change is required -- apparently no driver is at fault, it's just a load of the inter-APIC bus, I/O APICs (including system bus accesses) or the whole system in general. I'm hereby asking everyone not to modify drivers just to circumvent APIC lock-ups. Especially if such changes would "punish" perfectly good systems. It won't cure anything -- it might only make it happen less frequently due to different conditions. I hope to have changes ready to test by the next week. Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? @ 2001-01-12 16:53 Manfred Spraul 2001-01-12 17:02 ` David Woodhouse 0 siblings, 1 reply; 90+ messages in thread From: Manfred Spraul @ 2001-01-12 16:53 UTC (permalink / raw) To: frank, linux-kernel Let's decode it: > IO APIC #2...... > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: > 12 0FF 0F 0 1 0 1 0 1 1 91 > 13 0FF 0F 0 1 1 1 0 1 1 99 IRR for interrupt 19 is set, that means the IO APIC has sent the interrupt to a cpu but not yet received the corresponding EOI. That bit is read only, so we can't set it to 0 to kick the io apic. The Vector is 99, we must check that bit in the ISR, TMR and IRR of both cpus. cpu1: > ISR: all bits 0 > TMR: only bit 0x99 is set > IRR: all bits 0 cpu0: > ISR: all bits 0 > TMR: only bit 0x89 is set > IRR: bit 0xfc and bit 0xef are set. ISR is the in-server register, 0 means that the cpu is not processing an interrupt right now. TMR is the trigger mode registers, 1 means that the local apic should send an EOI to the io apic when the cpu signals the EOI to the local apic. IRR is the list of pending interrupts: 0xef is the local timer interrupt, 0xfc is the reschedule interrupt (see include/asm-i386/hw_irq.h) These bits are also read only. If you search the IO APIC documentation: number 29056601 - just search with google. The local APIC is documented in the main cpu handbook (PPro or later), in the chapter about multiple processor management -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 16:53 Manfred Spraul @ 2001-01-12 17:02 ` David Woodhouse 0 siblings, 0 replies; 90+ messages in thread From: David Woodhouse @ 2001-01-12 17:02 UTC (permalink / raw) To: Manfred Spraul; +Cc: frank, linux-kernel manfred@colorfullife.com said: > IRR for interrupt 19 is set, that means the IO APIC has sent the > interrupt to a cpu but not yet received the corresponding EOI. OK, but couldn't we reset it by sending an extra EOI when the drivers decide that they've missed interrupts? -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
@ 2001-01-10 21:30 Frank de Lange
2001-01-10 22:21 ` Manfred Spraul
2001-01-11 11:48 ` Andrew Morton
0 siblings, 2 replies; 90+ messages in thread
From: Frank de Lange @ 2001-01-10 21:30 UTC (permalink / raw)
To: linux-kernel
Hi'all,
Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
clones) in my BP-6 system, I've been experiencing intermittent network hangs. A
hang manifests itself as a total failure to communicate through either network
card, and can only be solved by rebooting. Removing and reloading the modules
does not fix the problem, only a reboot works.
I have searched high and low for possible causes, but I found no definite
answer. I suspect the problem may be hardware-related (the BP-6 can be tricky
sometimes), but I want to rule out software issues before I take the system
apart. So, my question: does anyone else experience these network-related
problems with a BP-6 based system? Or maybe other similar problems, where a
specific subsystem hangs an can only be revived by rebooting the box? The
network cards are currently in PCI4 and PCI5 (which should work, they are
NON-busmastering cards after all...), but relocating the cards does not solve
the problem. This problem has been nagging me ever since I moved to 2.3.x
(somewhere around 2.3.30). As it is intermittent, it is very difficult to pin
down. I suspect the APIC in not completely sane, or some timing on the bus is
out of spec, but that's no more than a suspicion. And since I do not have
access to a logic analyzer it is somewhat hard to prove...
Here's a lspci for the box:
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:09.0 Ethernet controller: Winbond Electronics Corp W89C940
00:0b.0 Multimedia video controller: Brooktree Corporation Bt878 (rev 02)
00:0b.1 Multimedia controller: Brooktree Corporation Bt878 (rev 02)
00:0d.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
00:0f.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 06)
00:11.0 Ethernet controller: Winbond Electronics Corp W89C940 (rev 0b)
00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
00:13.1 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
and here's a cat /proc/<interesting_file>
/proc/cpuinfo shows this box contains dual Celeron 466's (non-overclocked)
/proc/interrupts:
CPU0 CPU1
0: 10003353 9483961 IO-APIC-edge timer
1: 85449 84279 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
3: 167 249 IO-APIC-edge serial
4: 380807 381140 IO-APIC-edge serial
14: 136991 132077 IO-APIC-edge ide0
15: 25836 24605 IO-APIC-edge ide1
16: 42510 42482 IO-APIC-level es1371, mga@PCI:1:0:0
17: 26 26 IO-APIC-level sym53c8xx
18: 9287 8837 IO-APIC-level bttv
19: 205294 205191 IO-APIC-level eth0, eth1, usb-uhci
NMI: 19487238 19487238
LOC: 19488621 19488620
ERR: 0
/proc/meminfo:
total: used: free: shared: buffers: cached:
Mem: 261984256 260354048 1630208 0 12873728 99012608
Swap: 511926272 14245888 497680384
...
...
# network cards share IRQ with USB, which hosts...
/proc/bus/usb/devices:
T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 11/900 us ( 1%), #Int= 1, #Iso= 0
D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 0.00
S: Product=USB UHCI Root Hub
S: SerialNumber=c000
C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=255ms
T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=03f0 ProdID=0401 Rev= 1.00
S: Product=HP ScanJet 5200C
S: SerialNumber=SG95D1720ZHT
C:* #Ifs= 1 Cfg#= 1 Atr=60 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=00(>ifc ) Sub=00 Prot=00 Driver=usbscanner
E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl= 0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 16 Ivl= 0ms
E: Ad=83(I) Atr=03(Int.) MxPS= 1 Ivl=250ms
uname -a:
Linux behemoth.localnet 2.4.0 #1 SMP Fri Jan 5 15:41:39 CET 2001 i686 unknown
Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ frank@unternet.org /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 90+ messages in thread* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-10 21:30 Frank de Lange @ 2001-01-10 22:21 ` Manfred Spraul 2001-01-10 22:29 ` Frank de Lange 2001-01-10 22:40 ` Frank de Lange 2001-01-11 11:48 ` Andrew Morton 1 sibling, 2 replies; 90+ messages in thread From: Manfred Spraul @ 2001-01-10 22:21 UTC (permalink / raw) To: Frank de Lange; +Cc: linux-kernel Frank de Lange wrote: > > Hi'all, > > Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K > clones) in my BP-6 system, I've been experiencing intermittent network hangs. > Which driver do you use? The driver in 2.4.0 contains several bugfixes. If that driver still hangs then I'll double check the documentation. > A > hang manifests itself as a total failure to communicate through either network > card, and can only be solved by rebooting. Removing and reloading the modules > does not fix the problem, only a reboot works. > That's different from my problems: unload+reload always fixed my problems with the unpatch winbond-840 driver. > which should work, they are > NON-busmastering cards after all...), third line in w840_probe1(): pci_set_master(). And the documentation begins with W89C840F PCI Bus Master Fast Ethernet LAN Controller. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-10 22:21 ` Manfred Spraul @ 2001-01-10 22:29 ` Frank de Lange 2001-01-10 22:40 ` Frank de Lange 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-10 22:29 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote: > Which driver do you use? The driver in 2.4.0 contains several bugfixes. > If that driver still hangs then I'll double check the documentation. The NE2K PCI one... I'll try to fiddle around with the driver, who knows... > And the documentation begins with > W89C840F > PCI Bus Master Fast Ethernet LAN Controller. That is the 'F' (Fast) version. My cards are just plain and simple 10base2/T, humble and non-busmastering (AFAIK). Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-10 22:21 ` Manfred Spraul 2001-01-10 22:29 ` Frank de Lange @ 2001-01-10 22:40 ` Frank de Lange 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-10 22:40 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote: > > which should work, they are > > NON-busmastering cards after all...), > third line in w840_probe1(): > > pci_set_master(). > > And the documentation begins with > W89C840F > PCI Bus Master Fast Ethernet LAN Controller. ...in addition to my previous reply, your cards use the Winbond 840 series, while my cards use the 940 series. Higher number, but a less capabpe chipset or so it seems... Hm, but that reminds me not to get 840's to solve my problems :-) Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-10 21:30 Frank de Lange 2001-01-10 22:21 ` Manfred Spraul @ 2001-01-11 11:48 ` Andrew Morton 2001-01-11 15:22 ` Frank de Lange ` (5 more replies) 1 sibling, 6 replies; 90+ messages in thread From: Andrew Morton @ 2001-01-11 11:48 UTC (permalink / raw) To: Frank de Lange; +Cc: linux-kernel Frank de Lange wrote: > > Hi'all, > > Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K > clones) in my BP-6 system, I've been experiencing intermittent network hangs. A > hang manifests itself as a total failure to communicate through either network > card, and can only be solved by rebooting. Removing and reloading the modules > does not fix the problem, only a reboot works. > Losing both NICs at the same time could be the elusive "APIC stops generating interrupts" problem. Do you get any transmit timeout messages in the logs? If so, send them. Does it happen with a uniprocessor build? Are you able to boot with the `noapic' LILO option? If so, does that make it stop? - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton @ 2001-01-11 15:22 ` Frank de Lange 2001-01-11 16:55 ` Frank de Lange ` (4 subsequent siblings) 5 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-11 15:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Thu, Jan 11, 2001 at 10:48:23PM +1100, Andrew Morton wrote: > Losing both NICs at the same time could be the elusive "APIC > stops generating interrupts" problem. Yup, that's what I thought... But the real question is, is this a software/configuration problem or a hardware problem which can only be fixed by physically changing something on the board?... As it is, as you call it, 'elusive', it is a b*tch to pinpoint the source of these problems... > Do you get any transmit timeout messages in the logs? If > so, send them. Here they are (marked with ***): grep -B2 -A2 transmit /var/log/messages: Jan 10 22:24:47 behemoth kernel: usb_control/bulk_msg: timeout Jan 10 22:24:50 behemoth kernel: usb_control/bulk_msg: timeout *** Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:57:03 behemoth last message repeated 7 times Jan 10 22:57:03 behemoth kernel: SysRq: Emergency Sync -- Jan 10 22:57:09 behemoth kernel: Syncing device 16:07 ... OK Jan 10 22:57:09 behemoth kernel: Done. *** Jan 10 22:57:09 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:57:09 behemoth kernel: SysRq: Emergency Sync Jan 10 22:57:09 behemoth kernel: Syncing device 03:01 ... OK > Does it happen with a uniprocessor build? Not tried yet, since I wanna use both CPU's :-). > Are you able to boot with the `noapic' LILO option? I am, and did it a while ago. As far as I remember, it did not make it stop... I'll try again (even though it is not a real solution, since that APIC is there for a reason...) Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton 2001-01-11 15:22 ` Frank de Lange @ 2001-01-11 16:55 ` Frank de Lange 2001-01-11 19:18 ` Frank de Lange ` (3 subsequent siblings) 5 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-11 16:55 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel > Do you get any transmit timeout messages in the logs? If > so, send them. In addition to my previous message, here's what I get from the debug log facility: Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:51 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=33. Jan 10 22:56:52 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:52 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=26. Jan 10 22:56:53 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:53 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=30. Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=78. Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=32. Jan 10 22:56:58 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:56:58 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=89. Jan 10 22:57:00 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:57:00 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=77. Jan 10 22:57:03 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out Jan 10 22:57:03 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=171. So yeah, I get timeouts allright... Currently running NOAPIC, pity to see CPU1 receiving no interrupts at all... In the same debug log I now just saw this: Jan 11 17:37:05 behemoth kernel: spurious 8259A interrupt: IRQ7 That's weird, since there's nothing there...: cat /proc/interrupts CPU0 CPU1 0: 232967 0 XT-PIC timer 1: 6424 0 XT-PIC keyboard 2: 0 0 XT-PIC cascade 3: 138 0 XT-PIC serial 4: 46201 0 XT-PIC serial 9: 52 0 XT-PIC sym53c8xx 10: 744329 0 XT-PIC eth0, eth1, usb-uhci 11: 0 0 XT-PIC bttv 12: 0 0 XT-PIC es1371, mga@PCI:1:0:0 14: 19778 0 XT-PIC ide0 15: 4520 0 XT-PIC ide1 NMI: 0 0 LOC: 232916 232914 ERR: 1 See? Nothing on 7... This is with NOAPIC (as you can see from the XT-PIC's in the above dump). BP6 again? Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton 2001-01-11 15:22 ` Frank de Lange 2001-01-11 16:55 ` Frank de Lange @ 2001-01-11 19:18 ` Frank de Lange [not found] ` <3A5E0849.EB428D70@mandrakesoft.com> 2001-01-11 19:38 ` Frank de Lange ` (2 subsequent siblings) 5 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-11 19:18 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Here's another posting to the list which mentions problems with NE2K and BP6: http://web.gnu.walfield.org/mail-archive/linux-kernel/2000-August/0132.html "...In another machine, a dual celeron abit-bp6, recent 2.3.x kernels seem to dislike my realtek 8029 NIC. (I know, it's garbage plugged in to garbage...) The network card will die randomly, usually when I'm sending large amounts of data. When it dies, there are no kernel messages, and the interrupt count in /proc/interrupts for the card stop changing. Minor (painful) experimentation has shown that if the card is sharing the interrupt with anything else (say, ide2), it takes that with it. This only happens in "newer" kernels, it's fine in 2.2.16, and in some earlier 2.3.x kernels. It goes away if I boot with the noapic=1 kernel parameter, and seems to be replaced with harmless "spurious 8259A interrupt: IRQ7." messages. (I haven't configured any hardware at all to be on IRQ7 - though I'm lead to believe IRQ7 has some sort of special purpose) ..." So I'm not the only one... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <3A5E0849.EB428D70@mandrakesoft.com>]
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? [not found] ` <3A5E0849.EB428D70@mandrakesoft.com> @ 2001-01-12 0:28 ` Frank de Lange 2001-01-12 11:40 ` Andrew Morton 0 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-12 0:28 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel On Thu, Jan 11, 2001 at 02:23:53PM -0500, Jeff Garzik wrote: > Just out of curiosity, if you boot a Linux 2.4.0 kernel with the > "noapic" command line option, does behavior improve? For the curious, here's a summary of some tests I did: apic, 2 cpu's, no smp affinity -> network hangs under load apic, maxcpus=1, no smp affinity -> network hangs under load apic, 2 cpu's, smp affinity for all irq's on CPU1 -> network hangs under load noapic, 2 cpu's, no smp affinity -> NO HANG, WORKSFORME Quick and dirty conclusion: as soon as the apic comes in to play, things get messy... ps. load == 2 simultaneous nfs cp -rd <big_directory> sessions and streaming esd audio over the network Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 0:28 ` Frank de Lange @ 2001-01-12 11:40 ` Andrew Morton 2001-01-12 15:06 ` Frank de Lange 2001-01-12 15:36 ` Frank de Lange 0 siblings, 2 replies; 90+ messages in thread From: Andrew Morton @ 2001-01-12 11:40 UTC (permalink / raw) To: Frank de Lange; +Cc: Maciej W. Rozycki, linux-kernel Frank de Lange wrote: > > Quick and dirty conclusion: as soon as the apic comes in to play, things get > messy... Yup. Frank, for over a year there have been sporadic reports of APIC's forgetting how to deliver interrupts. Not only on BP6's. Often with 3com NICs, so I've never been 100% sure that it's not a failure in the NIC. In your case, you have three devices on the same IRQ and they all go to lunch at the same time. That's pretty convincing. Nobody has been able to repeat this frequently enough for any useful debugging to be done. Don't go away! Here is a debugging patch. Could you please apply this, rebuild and: 1: Type ALT-SYSRQ-A when everything is good 2: Type ALT-SYSRQ-A when everything is bad 3: send the resulting logs. I've Cc'ed Maciej, who understands this stuff. --- linux-2.4.0-test11.macro/arch/i386/kernel/io_apic.c Thu Oct 5 21:08:17 2000 +++ linux-2.4.0-test11/arch/i386/kernel/io_apic.c Sun Nov 26 12:39:01 2000 @@ -692,7 +692,7 @@ void __init UNEXPECTED_IO_APIC(void) printk(KERN_WARNING " to linux-smp@vger.kernel.org\n"); } -void __init print_IO_APIC(void) +void /*__init*/ print_IO_APIC(void) { int apic, i; struct IO_APIC_reg_00 reg_00; diff -up --recursive --new-file linux-2.4.0-test11.macro/drivers/char/sysrq.c linux-2.4.0-test11/drivers/char/sysrq.c --- linux-2.4.0-test11.macro/drivers/char/sysrq.c Tue Nov 14 10:24:52 2000 +++ linux-2.4.0-test11/drivers/char/sysrq.c Sun Nov 26 12:42:11 2000 @@ -72,6 +72,15 @@ void handle_sysrq(int key, struct pt_reg console_loglevel = 7; printk(KERN_INFO "SysRq: "); switch (key) { + case 'a': + printk("\n"); + printk("print_PIC()\n"); + print_PIC(); + printk("print_IO_APIC()\n"); + print_IO_APIC(); + printk("print_all_local_APICs()\n"); + print_all_local_APICs(); + break; case 'r': /* R -- Reset raw mode */ if (kbd) { kbd->kbdmode = VC_XLATE; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 11:40 ` Andrew Morton @ 2001-01-12 15:06 ` Frank de Lange 2001-01-12 15:36 ` Frank de Lange 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 15:06 UTC (permalink / raw) To: Andrew Morton; +Cc: Maciej W. Rozycki, linux-kernel On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote: > Frank de Lange wrote: > > > > Quick and dirty conclusion: as soon as the apic comes in to play, things get > > messy... > Here is a debugging patch. Could you please apply this, > rebuild and: > > 1: Type ALT-SYSRQ-A when everything is good > 2: Type ALT-SYSRQ-A when everything is bad > 3: send the resulting logs. WillCo... Now rebuilding... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-12 11:40 ` Andrew Morton 2001-01-12 15:06 ` Frank de Lange @ 2001-01-12 15:36 ` Frank de Lange 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-12 15:36 UTC (permalink / raw) To: Andrew Morton; +Cc: Maciej W. Rozycki, linux-kernel On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote: > Here is a debugging patch. Could you please apply this, > rebuild and: > > 1: Type ALT-SYSRQ-A when everything is good > 2: Type ALT-SYSRQ-A when everything is bad > 3: send the resulting logs. OK, here's the results I get... Before network hang =================== print_PIC() printing PIC contents print_IO_APIC() testing the IO APIC....................... .................................... done. print_all_local_APICs() ... APIC ID: 01000000 (1) ... APIC VERSION: 00040011 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000010000000000000000 ... APIC ID: 00000000 (0) ... APIC VERSION: 00040011 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000010000000000000001000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000001000 NOTICE: results differ every time I hit ALT-SYSRQ-A. The '1' bit at 'row 11, col. 26' stays '1' no matter how many times I use the magic keys. The other '1' bits jump around a bit, or disappear alltogether. Also, the sequence in which the APICs appear in the dump sometimes differs (this example shows 1 first, then 0, other times you'd see 0 first, then 1) After network hang ================== print_PIC() printing PIC contents print_IO_APIC() testing the IO APIC....................... .................................... done. print_all_local_APICs() ... APIC ID: 00000000 (0) ... APIC VERSION: 00040011 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000010000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000010000000000000000 ... APIC ID: 01000000 (1) ... APIC VERSION: 00040011 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000010000000000000000 NOTICE: hmmm... see, now that '1' bit at row 11, col. 26 for APIC 0 which was '1' before has turned to '0'. It will stay '0' no matter how many times I hit the magic keys... It seems to have been replaced by the '1' bit at row 11, col. 10, since that bit stays '1' no matter how many magic I throw at it... Hope this helps... If you need more, let me know... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton ` (2 preceding siblings ...) 2001-01-11 19:18 ` Frank de Lange @ 2001-01-11 19:38 ` Frank de Lange 2001-01-11 19:49 ` Frank de Lange 2001-01-11 21:09 ` Frank de Lange 5 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-11 19:38 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hm, the noapic option seems to help, as I'm currently beating the network to death but it won't die... As the problem is elusive, it is hard to tell, and it would not surprise me if the net dropped dead the moment this mail went through, but current indication is that noapic makes the sudden net-death disappear. So.... we're still left with the question 'is this hardware-related, or is it a software/configuration problem'? Other people seem to have similar problems with dissimilar hardware (tulip cards instead of Winbond, etc), on 2.2.x as well as 2.3/4.x. As I do not run Windows (NT or 2K), I can not tell if this problem also occurs there. And my FreeBSD-box is uniprocessor... So... has anyone seen anything like this on other 'true' (SMP) OS's? If so, that would indicate a hardware problem... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton ` (3 preceding siblings ...) 2001-01-11 19:38 ` Frank de Lange @ 2001-01-11 19:49 ` Frank de Lange 2001-01-11 21:09 ` Frank de Lange 5 siblings, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-11 19:49 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Another observation wrt. behaviour with 'noapic'... When streaming time-critical data over the network (running esound to another server, etc), sometimes there are hiccups in the stream. These hiccups seem to be much less frequent, if at all present, when running with 'noapic'. I'm currently running sound over a heavily loaded ethernet, no hiccups at all... Weird, since the apic ought to spread the load of handling the interrupts over all available CPU's. Whatever is causing this, there seems to be something fishy in the way interrupts are handled when the apic(s) is/are enabled... Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 11:48 ` Andrew Morton ` (4 preceding siblings ...) 2001-01-11 19:49 ` Frank de Lange @ 2001-01-11 21:09 ` Frank de Lange 2001-01-11 21:47 ` Jeff Garzik 5 siblings, 1 reply; 90+ messages in thread From: Frank de Lange @ 2001-01-11 21:09 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel OK, just one last addition to what has nearly become my own thread... I now am fairly certain that the problem (network stalls on multiprocessor systems) is not BP6 or NE2K-PCI specific. I found several postings which relate to similar problems on dissimilar hardware. Another interesting one is: Re: PROBLEM : Networking stops working with kernel 2.4.0-test11 (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg18722.html) "...I have an almos identical system as you, 2x200MMX motherboard (Gigabyte 586DX) also Voodoo3 (2000 pci) the same nic Realtek 8029AS, also a bt848 tv card, also SCSI (Aic-7880 onboard, but not used). I have reported it some time ago, and now all I get with 2.4.0-test11-pre4 and I think a additional patch is NETDEV WATCHDOG: eth0: transmit timed out, and something in the console about lost irq? I can't reproduce it with a uniprocesor kernel, and I have a 3c503 card wich uses the 8390 module, so I suppose that the problem it's not in the 8390, and it seems to be smp related...." ne2k-pci freezes with APIC error on 2.4.0-testX SMP (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg14468.html) "... When doing massive NFS transfers (2.4 machine as the client) on my SMP box (Abit BP6 2x celeronA 533mhz (non-overclocked) 64Mb ram, latest apt-get-ed debian woody) my ne2k-pci card (Realtek Semiconductor Co., Ltd. RTL-8029(AS) (rev 0)) suddenly stops working. test5 spits that in syslog:..." More to be found when searching the archives. This problem has been around for a long, long time (probably since the current level of apic-support was added, somewhere around 2.3.1x?). It has been reported by several people, several times. I feel like rigging every apic-related piece of code with a zillion bells and printk's but that would surely only create more mayhem as this whole thing seems to be timing-related... Anyone got any idea's on how to tackle this? Anyone who is 'intimate with' the apic-related code? It'll take me some time to dive into that part, so if there is anyone who already has taken the plunge, do tell... Cheers//Frank [ who is still running apic-less, without problems [ -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 21:09 ` Frank de Lange @ 2001-01-11 21:47 ` Jeff Garzik 2001-01-11 21:53 ` Frank de Lange 2001-01-12 14:35 ` David Woodhouse 0 siblings, 2 replies; 90+ messages in thread From: Jeff Garzik @ 2001-01-11 21:47 UTC (permalink / raw) To: Frank de Lange; +Cc: Andrew Morton, linux-kernel Frank de Lange wrote: > > OK, just one last addition to what has nearly become my own thread... > > I now am fairly certain that the problem (network stalls on multiprocessor systems) is not BP6 or NE2K-PCI specific. I found several postings which relate to similar problems on dissimilar hardware. Another interesting one is: > I have reported it some time ago, and now all I get with > 2.4.0-test11-pre4 and I think a additional patch is NETDEV WATCHDOG: > eth0: transmit timed out, and something in the console about lost irq? Are you judging based on the error message? The 'netdev watchdog ...' message is a generic error message that could have any number of causes. It's just saying, well, what it says :) The kernel was unable to transmit a packet in a certain amount of time. You might get these messages if you unplug a cable suddenly, or if your hardware isn't delivering interrupts, or many other things... Jeff -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 21:47 ` Jeff Garzik @ 2001-01-11 21:53 ` Frank de Lange 2001-01-12 14:35 ` David Woodhouse 1 sibling, 0 replies; 90+ messages in thread From: Frank de Lange @ 2001-01-11 21:53 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel On Thu, Jan 11, 2001 at 04:47:00PM -0500, Jeff Garzik wrote: > Are you judging based on the error message? The 'netdev watchdog ...' > message is a generic error message that could have any number of > causes. It's just saying, well, what it says :) The kernel was unable > to transmit a packet in a certain amount of time. You might get these > messages if you unplug a cable suddenly, or if your hardware isn't > delivering interrupts, or many other things... No, I'm judging based on the fact that I found reports from people using NE2K-PCI with several cards as well as tulip-based cards (different driver) on abit BP6 as well as Gigabyte motherboards, mostly on 2.3.x/2.4.x kernels. I found some postings with these problems on 2.2.x kernels. Cheers//Frank -- WWWWW _______________________ ## o o\ / Frank de Lange \ }# \| / \ ##---# _/ <Hacker for Hire> \ #### \ +31-320-252965 / \ frank@unternet.org / ------------------------- [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? 2001-01-11 21:47 ` Jeff Garzik 2001-01-11 21:53 ` Frank de Lange @ 2001-01-12 14:35 ` David Woodhouse 1 sibling, 0 replies; 90+ messages in thread From: David Woodhouse @ 2001-01-12 14:35 UTC (permalink / raw) To: Frank de Lange; +Cc: Jeff Garzik, Andrew Morton, linux-kernel frank@unternet.org said: > No, I'm judging based on the fact that I found reports from people > using NE2K-PCI with several cards as well as tulip-based cards > (different driver) on abit BP6 as well as Gigabyte motherboards, > mostly on 2.3.x/2.4.x kernels. I found some postings with these > problems on 2.2.x kernels. IRQ 19 on my BP6 stopped arriving a few days ago. 19: 90373 90473 IO-APIC-level usb-uhci Removing and reloading the usb-uhci driver didn't help. Loading the uhci driver just oopsed, which seems to be its normal behaviour on the occasions on which I try it. Rebooting fixed it. I was half tempted to code a 'kick APIC because I think it broke' function, but then decided not to bother. It might be nice in 2.5 to give drivers some way of kicking the APIC when they think they've missed an interrupt, much like the network code kicks the driver. And to deal more gracefully with IRQ storms. Once a driver has a way of saying "Oi! Why isn't IRQ x working?" it would be feasible to just disable the damn thing if we receive a million of them in rapid succession. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2001-01-16 19:25 UTC | newest]
Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-12 17:16 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Manfred Spraul
2001-01-12 17:33 ` Frank de Lange
2001-01-12 17:51 ` Manfred Spraul
2001-01-12 18:25 ` Frank de Lange
2001-01-12 19:04 ` Manfred Spraul
2001-01-12 19:07 ` Frank de Lange
2001-01-12 19:21 ` Frank de Lange
2001-01-12 19:33 ` Manfred Spraul
2001-01-12 19:52 ` Frank de Lange
2001-01-12 19:59 ` Linus Torvalds
2001-01-12 20:03 ` Ingo Molnar
2001-01-14 0:13 ` Roeland Th. Jansen
2001-01-14 0:23 ` Frank de Lange
2001-01-12 20:05 ` Frank de Lange
2001-01-12 20:11 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated? Manfred Spraul
2001-01-12 20:16 ` Frank de Lange
2001-01-12 20:19 ` Ingo Molnar
2001-01-12 20:26 ` Frank de Lange
2001-01-12 20:31 ` Ingo Molnar
2001-01-12 20:35 ` Frank de Lange
2001-01-12 20:37 ` Ingo Molnar
2001-01-12 20:46 ` David Woodhouse
2001-01-12 20:46 ` Frank de Lange
2001-01-12 20:51 ` Ingo Molnar
2001-01-12 21:05 ` Frank de Lange
2001-01-15 2:00 ` Jorge Nerin
2001-01-13 0:15 ` Linus Torvalds
2001-01-13 0:19 ` Frank de Lange
2001-01-13 0:29 ` Alan Cox
2001-01-12 20:54 ` Manfred Spraul
2001-01-12 21:07 ` Frank de Lange
2001-01-12 21:31 ` Manfred Spraul
2001-01-12 23:50 ` Alan Cox
2001-01-12 21:21 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange
2001-01-12 23:35 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Alan Cox
2001-01-13 0:06 ` Manfred Spraul
2001-01-13 0:36 ` Linus Torvalds
2001-01-13 0:48 ` Frank de Lange
2001-01-13 0:56 ` Linus Torvalds
2001-01-13 1:27 ` Frank de Lange
2001-01-13 1:51 ` Manfred Spraul
2001-01-13 2:11 ` Frank de Lange
2001-01-13 1:49 ` Jens Axboe
2001-01-13 2:12 ` Andrew Morton
2001-01-13 2:48 ` Linus Torvalds
2001-01-13 3:24 ` Andrew Morton
2001-01-13 12:37 ` Russell King
2001-01-13 15:18 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) Manfred Spraul
2001-01-13 23:55 ` Manfred Spraul
2001-01-14 0:18 ` Call for testers: ne2k-pci and io apic J . A . Magallon
2001-01-14 0:23 ` Call for testers: ne2k-pci and io apic (was: Re: QUESTION: Network hangs with BP6...) J . A . Magallon
2001-01-14 2:14 ` Call for testers: ne2k-pci and io apic J . A . Magallon
2001-01-15 16:15 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware Zdenek Kabelac
2001-01-13 1:38 ` Manfred Spraul
2001-01-13 2:10 ` Andrew Morton
2001-01-12 17:49 ` Alan Cox
2001-01-12 18:08 ` Manfred Spraul
2001-01-12 18:16 ` Ingo Molnar
2001-01-12 18:45 ` Manfred Spraul
2001-01-12 18:48 ` Ingo Molnar
2001-01-12 18:28 ` Linus Torvalds
2001-01-12 23:27 ` Alan Cox
2001-01-13 0:35 ` Linus Torvalds
2001-01-13 0:43 ` Alan Cox
2001-01-13 0:48 ` Linus Torvalds
2001-01-12 19:05 ` Frank de Lange
2001-01-12 20:04 ` Linus Torvalds
2001-01-15 14:36 ` Roeland Th. Jansen
2001-01-12 22:03 ` Latest status of IDE patches from Andre Jeff Nguyen
[not found] <20010112165104.A22465@unternet.org>
2001-01-16 19:23 ` QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Maciej W. Rozycki
-- strict thread matches above, loose matches on Subject: below --
2001-01-12 16:53 Manfred Spraul
2001-01-12 17:02 ` David Woodhouse
2001-01-10 21:30 Frank de Lange
2001-01-10 22:21 ` Manfred Spraul
2001-01-10 22:29 ` Frank de Lange
2001-01-10 22:40 ` Frank de Lange
2001-01-11 11:48 ` Andrew Morton
2001-01-11 15:22 ` Frank de Lange
2001-01-11 16:55 ` Frank de Lange
2001-01-11 19:18 ` Frank de Lange
[not found] ` <3A5E0849.EB428D70@mandrakesoft.com>
2001-01-12 0:28 ` Frank de Lange
2001-01-12 11:40 ` Andrew Morton
2001-01-12 15:06 ` Frank de Lange
2001-01-12 15:36 ` Frank de Lange
2001-01-11 19:38 ` Frank de Lange
2001-01-11 19:49 ` Frank de Lange
2001-01-11 21:09 ` Frank de Lange
2001-01-11 21:47 ` Jeff Garzik
2001-01-11 21:53 ` Frank de Lange
2001-01-12 14:35 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox