NMI watchdog + NOHZ question

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* NMI watchdog + NOHZ question
@ 2009-06-22  7:27 David Miller
  2009-06-22  8:18 ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-22  7:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: sparclinux

If some expert in this area can help I'd appreciate it.
And I'll note immediately that the issue I'm looking into
I've only investigated thoroughly with 2.6.29 vanilla.

In 2.6.29 we added an NMI watchdog timer to sparc64, it
operates identically to how the x86 one works except that
it's on by default :-)

When the qla2xxx driver is built into the kernel statically,
the firmware load causes an NMI watchdog timeout.

The qla2xxx driver is fine, it only actually disables interrupts for
very short periods to program the chip registers, telling it to load a
few blocks of the firmware via DMA or similar.

Then it waits for the interrupt to signal the firmware partial-load is
done using wait_for_completion_timeout() (see qla2x00_mailbox_command
in drivers/scsi/qla2xx/qla_mbx.c)

Assuming NOHZ is enabled, what if qla2xxx driver init is the only
running task on a cpu, no timers (at least for 5 seconds, the NMI
timeout) are due to fire, and the qla2xxx code loops in this manner
for more than 5 seconds loading the firmware?

As far as I can see it, the NOHZ code has no reason to start the timer
firing again in this situation.

So we'll just loop continuously into the scheduler (to wait for
the qla2xxx driver completion).  I believe the events trigger quick
enough that need_resched() is not true if the scheduler even makes
it to the idle thread.

So the sequence seems to be scheduling in and out of a pure kernel
thread, with no pending non-scheduler timers for a long time, and all
this happening for longer than the NMI watchdog timeout, with NOHZ
enabled.

I'll note that adding printk's (this is a serial console) to the
qla2xxx mailbox command code makes the NMI watchdog problem go away :)
But if I only put printk's around the entire firmware loading
sequence, the NMI watchdog does trigger.

Is there something fundamental that should be preventing this?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-22  7:27 NMI watchdog + NOHZ question David Miller
@ 2009-06-22  8:18 ` Andi Kleen
  2009-06-22  9:27   ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-06-22  8:18 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, sparclinux

David Miller <davem@davemloft.net> writes:
>
> Is there something fundamental that should be preventing this?

Unless that changed recently when I wasn't looking NOHZ should only
stop timers when the CPU is idle. So when a driver is doing
something and the interrupts are not disabled for too long the timers
should be ticking.

Then when you're idle interrupts should be never off, so the NMI
watchdog cannot fire. On x86 often the NMI watchdog is in fact
stopped on idle.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-22  8:18 ` Andi Kleen
@ 2009-06-22  9:27   ` David Miller
  2009-06-24  0:17     ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-22  9:27 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Mon, 22 Jun 2009 10:18:50 +0200

> David Miller <davem@davemloft.net> writes:
>>
>> Is there something fundamental that should be preventing this?
> 
> Unless that changed recently when I wasn't looking NOHZ should only
> stop timers when the CPU is idle. So when a driver is doing
> something and the interrupts are not disabled for too long the timers
> should be ticking.
> 
> Then when you're idle interrupts should be never off, so the NMI
> watchdog cannot fire. On x86 often the NMI watchdog is in fact
> stopped on idle.

Thanks Andi.

I think something else is afoot, because while using "nohz=off" makes
the problem go away, simply adding a NMI watchdog touch after the
schedule() call in cpu_idle() does not make the problem go away.

Also, the cpu that gets the NMI watchdog is different from the cpu
running the qla2xxx driver init.   That basically destroys the bulk
of my theory :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-22  9:27   ` David Miller
@ 2009-06-24  0:17     ` David Miller
  2009-06-24  7:03       ` Andi Kleen
  2009-09-03  9:36       ` David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller @ 2009-06-24  0:17 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Mon, 22 Jun 2009 02:27:52 -0700 (PDT)

> I think something else is afoot, because while using "nohz=off" makes
> the problem go away, simply adding a NMI watchdog touch after the
> schedule() call in cpu_idle() does not make the problem go away.
> 
> Also, the cpu that gets the NMI watchdog is different from the cpu
> running the qla2xxx driver init.   That basically destroys the bulk
> of my theory :-)

Ok, I think I know what's happening now.

CPU 0 is in the driver init and looping submitting mailbox
commands to load the firmware, then waiting for completion.

CPU 1 is receiving the device interrupts.  CPU 1 is where the
NMI watchdog triggers.

CPU 0 is submitting mailbox commands fast enough that by the
time CPU 1 returns from the device interrupt handler, a new
one is pending.  This sequence runs for more than 5 seconds.

The problematic case is CPU 1's timer interrupt running when
the barrage of device interrupts begin.  Then we have:

	timer interrupt
	return for softirq checking
	pending, thus enable interrupts

		 qla2xxx interrupt
		 return
		 qla2xxx interrupt
		 return
		 ... 5+ seconds pass
		 final qla2xxx interrupt for fw load
		 return

	run timer softirq
	return

At some point in the multi-second qla2xxx interrupt storm we trigger
the NMI watchdog on CPU 1 from the NMI interrupt handler.

The timer softirq, once we get back to running it, is smart enough to
run the timer work enough times to make up for the missed timer
interrupts.

However, the NMI watchdogs (both x86 and sparc) use the timer
interrupt count to notice the cpu is wedged.  But in the above
scenerio we'll receive only one such timer interrupt even if we
last all the way back to running the timer softirq.

I'm not exactly sure what to do about this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  0:17     ` David Miller
@ 2009-06-24  7:03       ` Andi Kleen
  2009-06-24  7:08         ` David Miller
  2009-09-03  9:36       ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-06-24  7:03 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

> I'm not exactly sure what to do about this.

Ack the timer interrupt earlier (and also give it a high priority?)

That could be still problematic if you have non nestabled irq stacks
(haven't checked if sparc has that or not), 
potentially you might need to run the softirq on the process stack.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:03       ` Andi Kleen
@ 2009-06-24  7:08         ` David Miller
  2009-06-24  7:15           ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-24  7:08 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 09:03:15 +0200

>> I'm not exactly sure what to do about this.
> 
> Ack the timer interrupt earlier (and also give it a high priority?)

It has a higher priority, but all interrupts get re-enabled right
before we process software interrupts.  So the flood of qla2xxx
interrupts can come in before we can run the timer softirq and
thus schedule the next timer interrupt.

> That could be still problematic if you have non nestabled irq stacks
> (haven't checked if sparc has that or not), 
> potentially you might need to run the softirq on the process stack.

IRQ stacks on sparc64 work identically to how they do on x86.

I have some more theories about this, in that I always see the
NMI watchdog message with a PC right in the section of CPU idle
where NOHZ is enabled.

On these cpus there is no support yielding, so on them I just
touch the NMI watchdog in the loop waiting for need_resched()
to become true.

But if we get the qla2xxx interrupt storm during that loop, it's
pretty easy to not touch the NMI watchdog in time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:08         ` David Miller
@ 2009-06-24  7:15           ` Andi Kleen
  2009-06-24  7:17             ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-06-24  7:15 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

On Wed, Jun 24, 2009 at 12:08:11AM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 24 Jun 2009 09:03:15 +0200
> 
> >> I'm not exactly sure what to do about this.
> > 
> > Ack the timer interrupt earlier (and also give it a high priority?)
> 
> It has a higher priority, but all interrupts get re-enabled right
> before we process software interrupts.  So the flood of qla2xxx
> interrupts can come in before we can run the timer softirq and
> thus schedule the next timer interrupt.

Ah you have a one shot timer and it gets rescheduled in the softirq?
If yes why not in doing that directly in the hardirq handler?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:15           ` Andi Kleen
@ 2009-06-24  7:17             ` David Miller
  2009-06-24  7:53               ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-24  7:17 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 09:15:55 +0200

> On Wed, Jun 24, 2009 at 12:08:11AM -0700, David Miller wrote:
>> From: Andi Kleen <andi@firstfloor.org>
>> Date: Wed, 24 Jun 2009 09:03:15 +0200
>> 
>> >> I'm not exactly sure what to do about this.
>> > 
>> > Ack the timer interrupt earlier (and also give it a high priority?)
>> 
>> It has a higher priority, but all interrupts get re-enabled right
>> before we process software interrupts.  So the flood of qla2xxx
>> interrupts can come in before we can run the timer softirq and
>> thus schedule the next timer interrupt.
> 
> Ah you have a one shot timer and it gets rescheduled in the softirq?
> If yes why not in doing that directly in the hardirq handler?

Then what's the point of the generic timer code supporting one-shot
clock sources? :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:17             ` David Miller
@ 2009-06-24  7:53               ` Andi Kleen
  2009-06-24  8:51                 ` David Miller
  2009-06-24  9:44                 ` David Miller
  0 siblings, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2009-06-24  7:53 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

> > Ah you have a one shot timer and it gets rescheduled in the softirq?
> > If yes why not in doing that directly in the hardirq handler?
> 
> Then what's the point of the generic timer code supporting one-shot
> clock sources? :-)

Well it would avoid that problem at least (I think based on your
description). Somehow you need to reschedule the timer before the softirq.

I guess you could have a generic function that is callable from hardirq
directly?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:53               ` Andi Kleen
@ 2009-06-24  8:51                 ` David Miller
  2009-06-24  9:44                 ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2009-06-24  8:51 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 09:53:42 +0200

>> > Ah you have a one shot timer and it gets rescheduled in the softirq?
>> > If yes why not in doing that directly in the hardirq handler?
>> 
>> Then what's the point of the generic timer code supporting one-shot
>> clock sources? :-)
> 
> Well it would avoid that problem at least (I think based on your
> description). Somehow you need to reschedule the timer before the softirq.
> 
> I guess you could have a generic function that is callable from hardirq
> directly?

I'll think a bit more about this, thanks Andi.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  7:53               ` Andi Kleen
  2009-06-24  8:51                 ` David Miller
@ 2009-06-24  9:44                 ` David Miller
  2009-06-24 10:23                   ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-24  9:44 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 09:53:42 +0200

>> > Ah you have a one shot timer and it gets rescheduled in the softirq?
>> > If yes why not in doing that directly in the hardirq handler?
>> 
>> Then what's the point of the generic timer code supporting one-shot
>> clock sources? :-)
> 
> Well it would avoid that problem at least (I think based on your
> description). Somehow you need to reschedule the timer before the softirq.
> 
> I guess you could have a generic function that is callable from hardirq
> directly?

Thinking about this some more, the issue I'm hitting has nothing to
do with how the timer fires.

The problem occurs when the cpu goes into NOHZ mode, and the timer
is not firing.  And I suspect x86 would hit this problem too as
currently coded.

Using sparc64 first as a concrete example, the idle loop is essentially:

	while(1) {
		tick_nohz_stop_sched_tick(1);

		while (!need_resched() && !cpu_is_offline(cpu))
			sparc64_yield(cpu);

		tick_nohz_restart_sched_tick();

		preempt_enable_no_resched();
 ...
		schedule();
		preempt_disable();
	}

And on this particular CPU type sparc64_yield() is simply

	touch_nmi_watchdog();

since this cpu doesn't support yielding.

So if we get that 5+ second qla2xxx interrupt storm during the
"while (!need_resched() ..." loop, no matter what we do the NMI
watchdog is going to trigger on us once the qla2xxx firmware
upload is complete.

X86 32-bit's cpu_idle() looks roughly like this:

	while (1) {
		tick_nohz_stop_sched_tick(1);
		while (!need_resched()) {

			check_pgt_cache();
			rmb();

			if (cpu_is_offline(cpu))
				play_dead();

			local_irq_disable();
			/* Don't trace irqs off for idle */
			stop_critical_timings();
			pm_idle();
			start_critical_timings();
		}
		tick_nohz_restart_sched_tick();
		preempt_enable_no_resched();
		schedule();
		preempt_disable();
	}

And similarly to sparc64, if that 5+ second qla2xxx interrupt
sequence happens after the tick_nohz_stop_sched_tick() call
we can run into the same situation.

Because the timer interrupt count is not incrementing, and it won't do
so for at least "5 * nmi_hz".

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  9:44                 ` David Miller
@ 2009-06-24 10:23                   ` Andi Kleen
  2009-06-24 10:32                     ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-06-24 10:23 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

> And similarly to sparc64, if that 5+ second qla2xxx interrupt
> sequence happens after the tick_nohz_stop_sched_tick() call
> we can run into the same situation.

Yes it would be probably safer to do the tick disabling with interrupts off
already.

These days NMI watchdog is not used much on x86 anymore because it's 
default off, so probably people never noticed that.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24 10:23                   ` Andi Kleen
@ 2009-06-24 10:32                     ` David Miller
  2009-06-24 10:52                       ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-24 10:32 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 12:23:25 +0200

>> And similarly to sparc64, if that 5+ second qla2xxx interrupt
>> sequence happens after the tick_nohz_stop_sched_tick() call
>> we can run into the same situation.
> 
> Yes it would be probably safer to do the tick disabling with
> interrupts off already.

That only makes sense if you're really putting the cpu to sleep
until an interrupt or similar happens.

Here in this sparc64 case I'm not, I just spin waiting for the exit
from cpu_idle() conditions.

I'll think more about how I'll handle this.  It's at least a relief to
understand exactly what causes this issue now :-)

> These days NMI watchdog is not used much on x86 anymore because it's 
> default off, so probably people never noticed that.

I really didn't want to provide the feature that way on sparc64 which
is why I made it on by default.  It would be interesting to reconsider
x86's default, perhaps even only on a trial basis in -next.

It's so useful, and in the short time sparc64 has had this NMI code I
can count at least 8 bugs I've fixed only because it was on all the
time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24 10:32                     ` David Miller
@ 2009-06-24 10:52                       ` Andi Kleen
  2009-06-24 10:59                         ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2009-06-24 10:52 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

On Wed, Jun 24, 2009 at 03:32:33AM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 24 Jun 2009 12:23:25 +0200
> 
> >> And similarly to sparc64, if that 5+ second qla2xxx interrupt
> >> sequence happens after the tick_nohz_stop_sched_tick() call
> >> we can run into the same situation.
> > 
> > Yes it would be probably safer to do the tick disabling with
> > interrupts off already.
> 
> That only makes sense if you're really putting the cpu to sleep
> until an interrupt or similar happens.

That is what the idle loop is supposed to do, isn't it?

> > These days NMI watchdog is not used much on x86 anymore because it's 
> > default off, so probably people never noticed that.
> 
> I really didn't want to provide the feature that way on sparc64 which
> is why I made it on by default.  It would be interesting to reconsider
> x86's default, perhaps even only on a trial basis in -next.

The reason it was turned off is that there are a few systems (e.g.
laptops from a particular vendor) which don't handle NMIs correctly
in the platform. When the NMI happens while SMI is active
they hang. Also there were a few other strange problems
on other systems that went away when it was disabled.

One way to handle all that would be to have a big NMI white/black
list for specific systems. That would be useful because there are
a few cases where NMIs are really useful: one example right now
is panic which is currently unable to stop other CPUs not
enabling interrupts.

But creating and maintaining such a list would be a lot of 
work (at least initially), and so far nobody was interested
enough to do that.

When you don't have as many different platforms and vendors
things are a lot easier.

> 
> It's so useful, and in the short time sparc64 has had this NMI code I
> can count at least 8 bugs I've fixed only because it was on all the
> time.

Yes when it was still on it also found bugs. On the other hand once
it is default one the number of new bugs you find with it goes
down quite fast.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24 10:52                       ` Andi Kleen
@ 2009-06-24 10:59                         ` David Miller
  2009-06-24 11:10                           ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2009-06-24 10:59 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 24 Jun 2009 12:52:23 +0200

> On Wed, Jun 24, 2009 at 03:32:33AM -0700, David Miller wrote:
>> From: Andi Kleen <andi@firstfloor.org>
>> Date: Wed, 24 Jun 2009 12:23:25 +0200
>> 
>> >> And similarly to sparc64, if that 5+ second qla2xxx interrupt
>> >> sequence happens after the tick_nohz_stop_sched_tick() call
>> >> we can run into the same situation.
>> > 
>> > Yes it would be probably safer to do the tick disabling with
>> > interrupts off already.
>> 
>> That only makes sense if you're really putting the cpu to sleep
>> until an interrupt or similar happens.
> 
> That is what the idle loop is supposed to do, isn't it?

Some sparc64 cpu's don't have a yield, and therefore can't
truly "sleep" during this loop.  That's what I'm talking
about.

>> > These days NMI watchdog is not used much on x86 anymore because it's 
>> > default off, so probably people never noticed that.
>> 
>> I really didn't want to provide the feature that way on sparc64 which
>> is why I made it on by default.  It would be interesting to reconsider
>> x86's default, perhaps even only on a trial basis in -next.
> 
> The reason it was turned off is that there are a few systems (e.g.
> laptops from a particular vendor) which don't handle NMIs correctly
> in the platform. When the NMI happens while SMI is active
> they hang. Also there were a few other strange problems
> on other systems that went away when it was disabled.

I wonder how many of those "few other strange problems" were of
the variety I'm diagnosing here :-)

Yes, it's a messy problem to turn on by default on x86 then.

Is this realm of systems-with-NMI-issues exclusive to x86-32 
or would it be more doable to turn it on by default for 64-bit
x86 builds?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24 10:59                         ` David Miller
@ 2009-06-24 11:10                           ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2009-06-24 11:10 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel, sparclinux

On Wed, Jun 24, 2009 at 03:59:14AM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Wed, 24 Jun 2009 12:52:23 +0200
> 
> > On Wed, Jun 24, 2009 at 03:32:33AM -0700, David Miller wrote:
> >> From: Andi Kleen <andi@firstfloor.org>
> >> Date: Wed, 24 Jun 2009 12:23:25 +0200
> >> 
> >> >> And similarly to sparc64, if that 5+ second qla2xxx interrupt
> >> >> sequence happens after the tick_nohz_stop_sched_tick() call
> >> >> we can run into the same situation.
> >> > 
> >> > Yes it would be probably safer to do the tick disabling with
> >> > interrupts off already.
> >> 
> >> That only makes sense if you're really putting the cpu to sleep
> >> until an interrupt or similar happens.
> > 
> > That is what the idle loop is supposed to do, isn't it?
> 
> Some sparc64 cpu's don't have a yield, and therefore can't
> truly "sleep" during this loop.  That's what I'm talking
> about.

How are power saving states invoked instead? Or do they not
having any power saving idle states?

> >> > These days NMI watchdog is not used much on x86 anymore because it's 
> >> > default off, so probably people never noticed that.
> >> 
> >> I really didn't want to provide the feature that way on sparc64 which
> >> is why I made it on by default.  It would be interesting to reconsider
> >> x86's default, perhaps even only on a trial basis in -next.
> > 
> > The reason it was turned off is that there are a few systems (e.g.
> > laptops from a particular vendor) which don't handle NMIs correctly
> > in the platform. When the NMI happens while SMI is active
> > they hang. Also there were a few other strange problems
> > on other systems that went away when it was disabled.
> 
> I wonder how many of those "few other strange problems" were of
> the variety I'm diagnosing here :-)

Some likely.

But the general problem is that hardware architects do not normally
consider NMIs as owned by the OS, but rather as owned by
the platform.

> Is this realm of systems-with-NMI-issues exclusive to x86-32 
> or would it be more doable to turn it on by default for 64-bit
> x86 builds?

Some of these problems were on 64bit capable systems.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NMI watchdog + NOHZ question
  2009-06-24  0:17     ` David Miller
  2009-06-24  7:03       ` Andi Kleen
@ 2009-09-03  9:36       ` David Miller
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2009-09-03  9:36 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel, sparclinux

From: David Miller <davem@davemloft.net>
Date: Tue, 23 Jun 2009 17:17:35 -0700 (PDT)

> I'm not exactly sure what to do about this.

As a followup I'm going to push the following to Linus and -stable to
work around the problem for the time being.

>From e6617c6ec28a17cf2f90262b835ec05b9b861400 Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Thu, 3 Sep 2009 02:35:20 -0700
Subject: [PATCH] sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.

This is a compromise and a temporary workaround for bootup NMI
watchdog triggers some people see with qla2xxx devices present.

This happens when, for example:

CPU 0 is in the driver init and looping submitting mailbox commands to
load the firmware, then waiting for completion.

CPU 1 is receiving the device interrupts.  CPU 1 is where the NMI
watchdog triggers.

CPU 0 is submitting mailbox commands fast enough that by the time CPU
1 returns from the device interrupt handler, a new one is pending.
This sequence runs for more than 5 seconds.

The problematic case is CPU 1's timer interrupt running when the
barrage of device interrupts begin.  Then we have:

	timer interrupt
	return for softirq checking
	pending, thus enable interrupts

		 qla2xxx interrupt
		 return
		 qla2xxx interrupt
		 return
		 ... 5+ seconds pass
		 final qla2xxx interrupt for fw load
		 return

	run timer softirq
	return

At some point in the multi-second qla2xxx interrupt storm we trigger
the NMI watchdog on CPU 1 from the NMI interrupt handler.

The timer softirq, once we get back to running it, is smart enough to
run the timer work enough times to make up for the missed timer
interrupts.

However, the NMI watchdogs (both x86 and sparc) use the timer
interrupt count to notice the cpu is wedged.  But in the above
scenerio we'll receive only one such timer interrupt even if we last
all the way back to running the timer softirq.

The default watchdog trigger point is only 5 seconds, which is pretty
low (the softwatchdog triggers at 60 seconds).  So increase it to 30
seconds for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/kernel/nmi.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 2c0cc72..b75bf50 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -103,7 +103,7 @@ notrace __kprobes void perfctr_irq(int irq, struct pt_regs *regs)
 	}
 	if (!touched && __get_cpu_var(last_irq_sum) == sum) {
 		local_inc(&__get_cpu_var(alert_counter));
-		if (local_read(&__get_cpu_var(alert_counter)) == 5 * nmi_hz)
+		if (local_read(&__get_cpu_var(alert_counter)) == 30 * nmi_hz)
 			die_nmi("BUG: NMI Watchdog detected LOCKUP",
 				regs, panic_on_timeout);
 	} else {
-- 
1.6.4.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-09-03  9:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-22  7:27 NMI watchdog + NOHZ question David Miller
2009-06-22  8:18 ` Andi Kleen
2009-06-22  9:27   ` David Miller
2009-06-24  0:17     ` David Miller
2009-06-24  7:03       ` Andi Kleen
2009-06-24  7:08         ` David Miller
2009-06-24  7:15           ` Andi Kleen
2009-06-24  7:17             ` David Miller
2009-06-24  7:53               ` Andi Kleen
2009-06-24  8:51                 ` David Miller
2009-06-24  9:44                 ` David Miller
2009-06-24 10:23                   ` Andi Kleen
2009-06-24 10:32                     ` David Miller
2009-06-24 10:52                       ` Andi Kleen
2009-06-24 10:59                         ` David Miller
2009-06-24 11:10                           ` Andi Kleen
2009-09-03  9:36       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox