* irq0 stops working
@ 2007-10-09 5:26 Vasily Averin
2007-10-09 5:34 ` Jan Engelhardt
2007-10-09 6:00 ` Thomas Gleixner
0 siblings, 2 replies; 6+ messages in thread
From: Vasily Averin @ 2007-10-09 5:26 UTC (permalink / raw)
To: Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner,
Arjan van de Ven
Cc: devel
On one of our servers timer interrupts (i.e irq0) are stops working. As result
any kernel timers do not triggers and tasks waiting some signals from timers
hangs forever.
Most noticeable effect of this situation is that any write operations to disk
are stalled, and nobody can log in on the node.
At the same time node all existing shells works away. I'm able to read
interrupts statistic from /proc/interrupts file and it shows that all other
interrupts are changed when these devices are accessed: disk on sata controller,
network, cdrom on ide controller, keyboard, serial console, LOC interrupts.
Also I've found that disable of irqbalance service on the node helps to
workaround this issue, however of course it fixes nothing.
All details about hardware/logs could be found in
http://bugzilla.kernel.org/show_bug.cgi?id=8650
I'm able to reproduce this situation, however now I have no ideas how to
continue the investigation of this problem.
Could please anybody advise me any new ways for investigation of this issue?
Thank you,
Vasily Averin
OpenVZ Linux Kernel Team
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: irq0 stops working
2007-10-09 5:26 irq0 stops working Vasily Averin
@ 2007-10-09 5:34 ` Jan Engelhardt
2007-10-09 6:12 ` Vasily Averin
2007-10-09 6:00 ` Thomas Gleixner
1 sibling, 1 reply; 6+ messages in thread
From: Jan Engelhardt @ 2007-10-09 5:34 UTC (permalink / raw)
To: Vasily Averin
Cc: Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner,
Arjan van de Ven, devel
On Oct 9 2007 09:26, Vasily Averin wrote:
>
>On one of our servers timer interrupts (i.e irq0) are stops working. As result
>any kernel timers do not triggers and tasks waiting some signals from timers
>hangs forever.
What kernel.. and tried CONFIG_NO_HZ=n?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: irq0 stops working
2007-10-09 5:26 irq0 stops working Vasily Averin
2007-10-09 5:34 ` Jan Engelhardt
@ 2007-10-09 6:00 ` Thomas Gleixner
2007-11-06 7:15 ` Vasily Averin
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2007-10-09 6:00 UTC (permalink / raw)
To: Vasily Averin
Cc: Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven, devel
On Tue, 9 Oct 2007, Vasily Averin wrote:
> On one of our servers timer interrupts (i.e irq0) are stops working. As result
> any kernel timers do not triggers and tasks waiting some signals from timers
> hangs forever.
Which kernel version ?
> Most noticeable effect of this situation is that any write operations to disk
> are stalled, and nobody can log in on the node.
>
> At the same time node all existing shells works away. I'm able to read
> interrupts statistic from /proc/interrupts file and it shows that all other
> interrupts are changed when these devices are accessed: disk on sata controller,
> network, cdrom on ide controller, keyboard, serial console, LOC interrupts.
>
> Also I've found that disable of irqbalance service on the node helps to
> workaround this issue, however of course it fixes nothing.
Well, it's at least a hint. Can you try the patch below please ?
tglx
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 6d48a4e..248987a 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -360,7 +360,7 @@ void stop_timer_interrupt(void)
static struct irqaction irq0 = {
.handler = timer_interrupt,
- .flags = IRQF_DISABLED | IRQF_IRQPOLL,
+ .flags = IRQF_DISABLED | IRQF_IRQPOLL | IRQF_NOBALANCING,
.mask = CPU_MASK_NONE,
.name = "timer"
};
@@ -403,6 +403,7 @@ void __init time_init(void)
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
+ irq0.mask = cpumask_of_cpu(0);
setup_irq(0, &irq0);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: irq0 stops working
2007-10-09 5:34 ` Jan Engelhardt
@ 2007-10-09 6:12 ` Vasily Averin
2007-10-09 6:23 ` Thomas Gleixner
0 siblings, 1 reply; 6+ messages in thread
From: Vasily Averin @ 2007-10-09 6:12 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Linux Kernel Mailing List, Andrew Morton, Thomas Gleixner,
Arjan van de Ven, devel
Jan Engelhardt wrote:
> On Oct 9 2007 09:26, Vasily Averin wrote:
>> On one of our servers timer interrupts (i.e irq0) are stops working. As result
>> any kernel timers do not triggers and tasks waiting some signals from timers
>> hangs forever.
>
> What kernel.. and tried CONFIG_NO_HZ=n?
Originally I've noticed this issue on RHEL5 kernels, but then I've reproduced it
on latest mainstream kernels, in my last attempt it was 2.6.23-rc7.
Thank you for for your tips about CONFIG_NO_HZ=n, will try to to it.
thank you,
Vasily Averin
OpenVZ Linux Kernel Team
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: irq0 stops working
2007-10-09 6:12 ` Vasily Averin
@ 2007-10-09 6:23 ` Thomas Gleixner
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2007-10-09 6:23 UTC (permalink / raw)
To: Vasily Averin
Cc: Jan Engelhardt, Linux Kernel Mailing List, Andrew Morton,
Arjan van de Ven, devel
On Tue, 9 Oct 2007, Vasily Averin wrote:
> Jan Engelhardt wrote:
> > On Oct 9 2007 09:26, Vasily Averin wrote:
> >> On one of our servers timer interrupts (i.e irq0) are stops working. As result
> >> any kernel timers do not triggers and tasks waiting some signals from timers
> >> hangs forever.
> >
> > What kernel.. and tried CONFIG_NO_HZ=n?
>
> Originally I've noticed this issue on RHEL5 kernels, but then I've reproduced it
> on latest mainstream kernels, in my last attempt it was 2.6.23-rc7.
>
> Thank you for for your tips about CONFIG_NO_HZ=n, will try to to it.
You run a 64 bit kernel, where this option is not available yet.
tglx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: irq0 stops working
2007-10-09 6:00 ` Thomas Gleixner
@ 2007-11-06 7:15 ` Vasily Averin
0 siblings, 0 replies; 6+ messages in thread
From: Vasily Averin @ 2007-11-06 7:15 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven, devel
Thomas Gleixner wrote:
> On Tue, 9 Oct 2007, Vasily Averin wrote:
>> On one of our servers timer interrupts (i.e irq0) are stops working. As result
>> any kernel timers do not triggers and tasks waiting some signals from timers
>> hangs forever.
>>
>> Also I've found that disable of irqbalance service on the node helps to
>> workaround this issue, however of course it fixes nothing.
>
> Well, it's at least a hint. Can you try the patch below please ?
Patch helps, after 1-month-long testing I could not reproduce this issue again,
what are we going to do now?
Unfortunately my test node is still unstable, now it reproduces sata timeout
issue, details are here:
http://bugzilla.kernel.org/show_bug.cgi?id=8650
Thank you,
Vasily Averin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-06 7:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-09 5:26 irq0 stops working Vasily Averin
2007-10-09 5:34 ` Jan Engelhardt
2007-10-09 6:12 ` Vasily Averin
2007-10-09 6:23 ` Thomas Gleixner
2007-10-09 6:00 ` Thomas Gleixner
2007-11-06 7:15 ` Vasily Averin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox