public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Severe IRQ problems on Foster (P4 Xeon) system
@ 2002-03-13 19:11 Martin Wilck
  2002-03-13 19:10 ` Ingo Molnar
  2002-03-19 14:32 ` Maciej W. Rozycki
  0 siblings, 2 replies; 32+ messages in thread
From: Martin Wilck @ 2002-03-13 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Maciej W. Rocycki; +Cc: Linux Kernel mailing list, Martin Wilck


Hi Ingo and Maciej, everybody,

I am contacting you as the Linux experts for (IO)-Apics.

We are currrently testing our new Foster (Pentium 4 Xeon with HT
technology) machines and facing really weird problems particularly
with the timer interrupt. The machine I am talking about has 2 physical,
i.e. 4 logical CPUs. It has 4 dual-port Intel Pro/100 network cards
built in.

** Before you read further: I cannot 100% exclude a hardware/BIOS
   error yet, but something Linux does must have to do with it **

First of all, we see that virtually 100% of all IRQs are handled by CPU 0.
I have seen this reported a number of times before. I guess it can become
a severe performance problem in IRQ-intensive situations.

But much worse, we encounter the following problem:

- sporadically the timer interrupt becomes disabled sometime after
  booting (typically around X startup).

- This is caused by a "junk" value that appears in the mask register
  of the master 8259a PIC (the system is working in ExtINT mode for timer
  IRQs.

- The system can be "healed" by rewriting the correct value into register
  0x21.

- We see with a bus logic analyzer that the junk value is read after the
  completely legal mask 0xfa was last written to the register.
  However according to my understanding that register should not be
  written after initialization, because IRQ are disabled through the
  APIC, not the 8259a?

- The junk value always appears after CPU 1 gets a timer interrupt.
  As stated above, this hardly happens at all, but CPU 1 seems to get
  2-6 timer interrupts during each boot. These are all consecutive,
  and starting with the second one the "junk" value is found in io-port
  0x21.

Here is an excerpt from my syslog demonstrating this (I put some printk's
in do_IRQ, printk is triggered if either a CPU other than 0 serves the
IRQ or if the value in port 0x21 != 0xfa, lines ending):

                                 jiffies | 0x21        desc->status|
                                         V   V                     V
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8931: fa, CPU 1 IRQ 0 status 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8931: fa, CPU 1 IRQ 0 stat 5
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8932: fa, CPU 1 IRQ 0 status 1  <==
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8932: 07, CPU 1 IRQ 0 stat 5    <==
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8934: 07, CPU 1 IRQ 0 status 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8934: 07, CPU 0 IRQ 0 stat 0    <==
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8935: 07, CPU 1 IRQ 0 stat 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8935: 07, CPU 1 IRQ 0 status 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8936: 07, CPU 1 IRQ 0 stat 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8936: 07, CPU 1 IRQ 0 status 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8937: 07, CPU 1 IRQ 0 stat 0
Mar 12 16:51:40 pdb0374c kernel: IRQ1: 8937: 07, CPU 0 IRQ 28 stat 0
Mar 12 16:51:40 pdb0374c last message repeated 42 times
[ no more timer interrupts below here ].

Lines ending in "status" represent entry in do_IRQ(), "stat" represents
exit from do_IRQ() (label "out:").

Where the <== arrows are, it seems that CPU 0 and 1 are executing the
timer IRQ in parallel (!). The entry to do_IRQ for CPU 0 is not listed
because my conditions for printk were not met.

Very strange, too: After the PIC is masked, we see 4(!) more timer
interrupts - where do they come from ?

My guess: We are in a situation where CPU 0 is overwhelmed by interrupts
from the 5 network interfaces in this machine. This leads to timer
interrupts "piling up", and eventually an interrupt is even delivered
to CPU 1. If this happens to occur simultaneously with CPU 0, we get
some sort of race condition.

Our PCI bus protocol analysis gives another hint:

It seems that the problem occurs in this code fragment from time.c:

#ifdef CONFIG_X86_IO_APIC
	if (timer_ack) {
		/*
		 * Subtle, when I/O APICs are used we have to ack timer IRQ
		 * manually to reset the IRR bit for do_slow_gettimeoffset().
		 * This will also deassert NMI lines for the watchdog if run
		 * on an 82489DX-based system.
		 */
		spin_lock(&i8259A_lock);
		outb(0x0c, 0x20);
		/* Ack the IRQ; AEOI will end it automatically. */   <==
		inb(0x20);
		spin_unlock(&i8259A_lock);
	}
#endif

The above fragment is executed ~50us after the last write of the (correct)
value 0xfa to port 0x21.

Where the <== arrow is, the analyzer clearly shows the outb (0x0c, 0x20)
operation. After that, we see strange cycles for ~70us (probably special
cycles for arbitration, definitely no valid data transfers). The very next bus
operation is a read on port 0x21 that reveals the junk value 0x07! The
inb(0x20) call is not captured in our protocol, it must occur long after
the error. (We saw normal execution of the above code fragment where
there is ~1us between the outb and inb, where it is >120us here).

That's our state of insight so far. Any hints are very welcome.

Martin

PS We were testing with RedHat's 2.4.9-21 and SuSE's 2.4.16 kernel.
   AFAICS, there are no significant changes between these and newer
   kernels wrt APIC handling. The error occurs only once in a few
   hours, so our testing possibilities are limited by time.
   And yes, we were using Intel's e100.o driver for the ethernet
   boards. But the driver is not to blame, the problem occurs even
   if no driver is loaded for the cards at all.

-- 
Martin Wilck                Phone: +49 5251 8 15113
Fujitsu Siemens Computers   Fax:   +49 5251 8 20409
Heinz-Nixdorf-Ring 1	    mailto:Martin.Wilck@Fujitsu-Siemens.com
D-33106 Paderborn           http://www.fujitsu-siemens.com/primergy






^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2002-03-27 17:35 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-13 19:11 Severe IRQ problems on Foster (P4 Xeon) system Martin Wilck
2002-03-13 19:10 ` Ingo Molnar
2002-03-13 22:22   ` Bill Davidsen
2002-03-13 22:51     ` Alan Cox
2002-03-14  5:29       ` rddunlap
2002-03-14 14:26         ` Alan Cox
2002-03-15  7:55     ` Ingo Molnar
2002-03-14 13:29   ` Martin Wilck
2002-03-14 14:34     ` Martin J. Bligh
2002-03-14 18:22       ` Martin J. Bligh
2002-03-15  7:50         ` Ingo Molnar
2002-03-15  7:43     ` Ingo Molnar
2002-03-15 18:08       ` Martin J. Bligh
2002-03-19 13:38         ` Maciej W. Rozycki
2002-03-19 13:11   ` Maciej W. Rozycki
2002-03-27 17:35   ` Steffen Persvold
2002-03-19 14:32 ` Maciej W. Rozycki
2002-03-19 16:39   ` Martin Wilck
2002-03-19 23:28     ` Pavel Machek
2002-03-20  7:52       ` Martin Wilck
2002-03-20 13:27         ` Maciej W. Rozycki
2002-03-20 13:47           ` Martin Wilck
2002-03-20 14:13             ` Maciej W. Rozycki
2002-03-20 14:29           ` Alan Cox
2002-03-21 18:53           ` Martin Wilck
2002-03-20 13:52     ` Maciej W. Rozycki
2002-03-20 16:19       ` Martin Wilck
2002-03-21 13:03         ` Maciej W. Rozycki
2002-03-20 20:24   ` William Lee Irwin III
2002-03-21  6:54     ` Zwane Mwaikambo
2002-03-21 23:35       ` William Lee Irwin III
2002-03-22  0:18         ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox