All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, har
  2001-01-15 18:45 QUESTION: Network hangs with BP6 and 2.4.x kernels, har Petr Vandrovec
@ 2001-01-15 18:42 ` Roeland Th. Jansen
  0 siblings, 0 replies; 2+ messages in thread
From: Roeland Th. Jansen @ 2001-01-15 18:42 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: linux-kernel, torvalds

On Mon, Jan 15, 2001 at 06:45:06PM +0000, Petr Vandrovec wrote:
> I think that on BP6 hardware there is no way around except using 'noapic', 
> or passing board through Abit replacement program. There is only two bit 
> checksum which guards 8 or 22 data bits. I have no idea how frequent two 
> bits errors are, but, as your example shows, they definitely happen on 
> your hardware.

thanks for the explanation. I run noapic right now and didn't die yet. I
looked at the irq stuff and decided that I probably don't need it
anyways.

are there new(er) boards known that do not have this problem ?
(pls reply to bengel@grobbebol.xs4all.nl)

-- 
Grobbebol's Home                   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel       | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |        on Usenet.             _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, har
@ 2001-01-15 18:45 Petr Vandrovec
  2001-01-15 18:42 ` Roeland Th. Jansen
  0 siblings, 1 reply; 2+ messages in thread
From: Petr Vandrovec @ 2001-01-15 18:45 UTC (permalink / raw)
  To: Roeland Th. Jansen; +Cc: linux-kernel, torvalds

On 15 Jan 01 at 14:36, Roeland Th. Jansen wrote:
> On Fri, Jan 12, 2001 at 12:04:21PM -0800, Linus Torvalds wrote:
> > Ok, so it's tentatively the IOAPIC disable/enable code.  But it could
> > obviously be something that just interacts with it, including just a
> > timing issue (ie the _real_ bug might just be bad behaviour when
> > changing IO-APIC state at the same time as an interrupt happens, and
> > disable/enable-irq just happen to be the only things that do it at a
> > high enough frequency that you can see the problem). 
> 
> my BP6 with the patch frank sent me and the apic code at line 273 (or
> so) defined as '1' and a flood ping :
> 
> Jan 14 19:56:19 grobbebol kernel: APIC error on CPU1: 02(02)
> Jan 14 19:56:25 grobbebol kernel: APIC error on CPU1: 02(02)
> Jan 14 19:58:10 grobbebol last message repeated 2 times
> Jan 14 20:00:01 grobbebol kernel: APIC error on CPU1: 02(02)
> Jan 14 20:01:11 grobbebol last message repeated 2 times
> Jan 14 20:01:48 grobbebol kernel: APIC error on CPU1: 02(02)
> Jan 14 20:01:59 grobbebol kernel: APIC error on CPU1: 02(08)
> Jan 14 20:02:10 grobbebol kernel: APIC error on CPU1: 08(08)
> Jan 14 20:02:39 grobbebol kernel: APIC error on CPU1: 08(02)
> Jan 14 20:02:39 grobbebol kernel: unexpected IRQ trap at vector 8d
> Jan 14 20:15:32 grobbebol kernel: APIC error on CPU1: 02(08)
> [....]
> ad the network is dead. however, no crashes seen during this.

It is expected. inter-APIC message got finally so damaged that
checksum was OK, but IRQ trap vector got mangled from 99 -> 8d
(I bet that it was 99->8d, as both have same checksum, and 99 could
be used...). So local APIC confirmed reception of 8d interrupt, 
but 8d interrupt was never requested by IOAPIC :-( So 8d confirmation
is droped into wastebasket, but 99 IRQ is still marked as serviced
in IOAPIC, but never seen/EOIed by CPU.

For such motherboard you have two choices: (1) do not use IOAPIC at all 
(when LINT#0/#1 are used in 8259 mode, they are not so sensitive to
electrical noise) or (2) apply another (frank's?) patch which resets IRQ 
line every few seconds. Maybe hooking this reinitialization into NE2K 
timeout hook... Or into userspace daemon when received packets does not 
climb up for couple of seconds... 

I think that on BP6 hardware there is no way around except using 'noapic', 
or passing board through Abit replacement program. There is only two bit 
checksum which guards 8 or 22 data bits. I have no idea how frequent two 
bits errors are, but, as your example shows, they definitely happen on 
your hardware.
                                                Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-01-15 18:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15 18:45 QUESTION: Network hangs with BP6 and 2.4.x kernels, har Petr Vandrovec
2001-01-15 18:42 ` Roeland Th. Jansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.