public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* x86's nmi_hz wrt. oprofile's nmi_timer_int.c
@ 2009-01-29 23:58 David Miller
  2009-01-30 15:01 ` Ingo Molnar
  2009-02-22 17:06 ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2009-01-29 23:58 UTC (permalink / raw)
  To: tglx, mingo; +Cc: linux-kernel


While working on an NMI watchdog implementation on sparc64
I noticed what seems to be a peculiar behavior of the NMI
timer int oprofile support on x86.

When the NMI watchdog tests itself at boot timer we start
with nmi_hz equal to HZ.

After the NMI watchdog self-test passes, nmi_hz is reduced
down to '1'.

The NMI timer int oprofile support simply uses DIE_NMI notifiers for
it's implementation.  But I don't see anything in the code of
arch/x86/oprofile/nmi_timer_int.c nor the NMI watchdog infrastructure
which will re-adjust nmi_hz back to HZ or something similar.

Am I missing something?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-01-29 23:58 x86's nmi_hz wrt. oprofile's nmi_timer_int.c David Miller
@ 2009-01-30 15:01 ` Ingo Molnar
  2009-01-30 21:54   ` David Miller
  2009-02-22 17:06 ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2009-01-30 15:01 UTC (permalink / raw)
  To: David Miller; +Cc: tglx, mingo, linux-kernel, H. Peter Anvin


* David Miller <davem@davemloft.net> wrote:

> While working on an NMI watchdog implementation on sparc64 I noticed 
> what seems to be a peculiar behavior of the NMI timer int oprofile 
> support on x86.
> 
> When the NMI watchdog tests itself at boot timer we start with nmi_hz 
> equal to HZ.
> 
> After the NMI watchdog self-test passes, nmi_hz is reduced down to '1'.
> 
> The NMI timer int oprofile support simply uses DIE_NMI notifiers for 
> it's implementation.  But I don't see anything in the code of 
> arch/x86/oprofile/nmi_timer_int.c nor the NMI watchdog infrastructure 
> which will re-adjust nmi_hz back to HZ or something similar.
> 
> Am I missing something?

Reducing it to 1 HZ was kind of a performance hack: running NMIs at HZ 
needlessly interrupts the CPU HZ times a second. It's more than enough to 
have 1 nmi-watchdog tick per second to notice deadlocks that take longer 
than 5 seconds.

Can you see a problem with that approach, or was this just a question 
about why it's reduced to 1 Hz?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-01-30 15:01 ` Ingo Molnar
@ 2009-01-30 21:54   ` David Miller
  2009-02-02 23:14     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2009-01-30 21:54 UTC (permalink / raw)
  To: mingo; +Cc: tglx, mingo, linux-kernel, hpa

From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 30 Jan 2009 16:01:25 +0100

> 
> * David Miller <davem@davemloft.net> wrote:
> 
> > While working on an NMI watchdog implementation on sparc64 I noticed 
> > what seems to be a peculiar behavior of the NMI timer int oprofile 
> > support on x86.
> > 
> > When the NMI watchdog tests itself at boot timer we start with nmi_hz 
> > equal to HZ.
> > 
> > After the NMI watchdog self-test passes, nmi_hz is reduced down to '1'.
> > 
> > The NMI timer int oprofile support simply uses DIE_NMI notifiers for 
> > it's implementation.  But I don't see anything in the code of 
> > arch/x86/oprofile/nmi_timer_int.c nor the NMI watchdog infrastructure 
> > which will re-adjust nmi_hz back to HZ or something similar.
> > 
> > Am I missing something?
> 
> Reducing it to 1 HZ was kind of a performance hack: running NMIs at HZ 
> needlessly interrupts the CPU HZ times a second. It's more than enough to 
> have 1 nmi-watchdog tick per second to notice deadlocks that take longer 
> than 5 seconds.

For the NMI watchdog's purposes I understand the intent, and this
is perfectly fine.

The problem is that it stays at '1' when oprofile starts using the NMI
watchdog, and we certainly want more than one oprofile tick per second
:-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-01-30 21:54   ` David Miller
@ 2009-02-02 23:14     ` David Miller
  2009-02-03 12:27       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2009-02-02 23:14 UTC (permalink / raw)
  To: mingo; +Cc: tglx, mingo, linux-kernel, hpa

From: David Miller <davem@davemloft.net>
Date: Fri, 30 Jan 2009 13:54:09 -0800 (PST)

> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 30 Jan 2009 16:01:25 +0100
> 
> > 
> > * David Miller <davem@davemloft.net> wrote:
> > 
> > Reducing it to 1 HZ was kind of a performance hack: running NMIs at HZ 
> > needlessly interrupts the CPU HZ times a second. It's more than enough to 
> > have 1 nmi-watchdog tick per second to notice deadlocks that take longer 
> > than 5 seconds.
> 
> For the NMI watchdog's purposes I understand the intent, and this
> is perfectly fine.
> 
> The problem is that it stays at '1' when oprofile starts using the NMI
> watchdog, and we certainly want more than one oprofile tick per second
> :-)

Just making sure you understand the problem, here is the
sequence of events:

1) At bootup, the NMI watchdog is tested.

   It is tested with nmi_hz=HZ

2) If the test passes, nmi_hz is reduced down to '1'

As I stated, everything up to this point is fine.  Next:

3) oprofile initializes and if we choose to use the NMI
   timer for oprofile profiling it is implemented using
   a simple DIE_NMI notifier.

   However, nmi_hz is still just '1' which means that oprofile
   will only receive one sample per-second.  And this is definitely
   not what we want.

Somehow the code in arch/x86/oprofile/nmi_timer_int.c needs to
have an interface into the NMI watchdog core so that it can
increase nmi_hz back up to "HZ" when the NMI timer profiling is
enabled and back down to "1" when such profiling stops.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-02-02 23:14     ` David Miller
@ 2009-02-03 12:27       ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2009-02-03 12:27 UTC (permalink / raw)
  To: David Miller, Paul Mackerras, Peter Zijlstra
  Cc: tglx, mingo, linux-kernel, hpa


* David Miller <davem@davemloft.net> wrote:

> From: David Miller <davem@davemloft.net>
> Date: Fri, 30 Jan 2009 13:54:09 -0800 (PST)
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Fri, 30 Jan 2009 16:01:25 +0100
> > 
> > > 
> > > * David Miller <davem@davemloft.net> wrote:
> > > 
> > > Reducing it to 1 HZ was kind of a performance hack: running NMIs at HZ 
> > > needlessly interrupts the CPU HZ times a second. It's more than enough to 
> > > have 1 nmi-watchdog tick per second to notice deadlocks that take longer 
> > > than 5 seconds.
> > 
> > For the NMI watchdog's purposes I understand the intent, and this
> > is perfectly fine.
> > 
> > The problem is that it stays at '1' when oprofile starts using the NMI
> > watchdog, and we certainly want more than one oprofile tick per second
> > :-)
> 
> Just making sure you understand the problem, here is the
> sequence of events:
> 
> 1) At bootup, the NMI watchdog is tested.
> 
>    It is tested with nmi_hz=HZ
> 
> 2) If the test passes, nmi_hz is reduced down to '1'
> 
> As I stated, everything up to this point is fine.  Next:
> 
> 3) oprofile initializes and if we choose to use the NMI
>    timer for oprofile profiling it is implemented using
>    a simple DIE_NMI notifier.
> 
>    However, nmi_hz is still just '1' which means that oprofile
>    will only receive one sample per-second.  And this is definitely
>    not what we want.
> 
> Somehow the code in arch/x86/oprofile/nmi_timer_int.c needs to have an 
> interface into the NMI watchdog core so that it can increase nmi_hz back 
> up to "HZ" when the NMI timer profiling is enabled and back down to "1" 
> when such profiling stops.

btw., these types of interactions will be solved in a natural way via 
perfcounters: in that model the NMI watchdog is a set of per-CPU counters 
running on each CPU [with a NMI watchdog callback in the IRQ handling 
routine] - and oprofile uses its own perfcounter - what is left of the PMU 
hardware.

I.e. each PMU using facility can just use performance counters transparently 
and interactions will be solved naturally by perfcounters resource 
management.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-01-29 23:58 x86's nmi_hz wrt. oprofile's nmi_timer_int.c David Miller
  2009-01-30 15:01 ` Ingo Molnar
@ 2009-02-22 17:06 ` Andi Kleen
  2009-02-23  4:11   ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2009-02-22 17:06 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel

David Miller <davem@davemloft.net> writes:

Really old mail, but I was very behind. I didn't see an 
correct answer, so let's answer it.

> While working on an NMI watchdog implementation on sparc64
> I noticed what seems to be a peculiar behavior of the NMI
> timer int oprofile support on x86.
>
> When the NMI watchdog tests itself at boot timer we start
> with nmi_hz equal to HZ.
>
> After the NMI watchdog self-test passes, nmi_hz is reduced
> down to '1'.
>
> The NMI timer int oprofile support simply uses DIE_NMI notifiers for
> it's implementation.  But I don't see anything in the code of
> arch/x86/oprofile/nmi_timer_int.c nor the NMI watchdog infrastructure
> which will re-adjust nmi_hz back to HZ or something similar.
>
> Am I missing something?

oprofile generates its own NMIs, it does not rely on 
the ones from the nmi watchdog.

In timer mode it does not use nmis or die notifiers, but relies on the 
regular non nmi timer interrupt.

Does that answer your question?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-02-22 17:06 ` Andi Kleen
@ 2009-02-23  4:11   ` David Miller
  2009-02-23  4:52     ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2009-02-23  4:11 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel

From: Andi Kleen <andi@firstfloor.org>
Date: Sun, 22 Feb 2009 18:06:45 +0100

> David Miller <davem@davemloft.net> writes:
> 
> Really old mail, but I was very behind. I didn't see an 
> correct answer, so let's answer it.
> 
> > While working on an NMI watchdog implementation on sparc64
> > I noticed what seems to be a peculiar behavior of the NMI
> > timer int oprofile support on x86.
> >
> > When the NMI watchdog tests itself at boot timer we start
> > with nmi_hz equal to HZ.
> >
> > After the NMI watchdog self-test passes, nmi_hz is reduced
> > down to '1'.
> >
> > The NMI timer int oprofile support simply uses DIE_NMI notifiers for
> > it's implementation.  But I don't see anything in the code of
> > arch/x86/oprofile/nmi_timer_int.c nor the NMI watchdog infrastructure
> > which will re-adjust nmi_hz back to HZ or something similar.
> >
> > Am I missing something?
> 
> oprofile generates its own NMIs, it does not rely on 
> the ones from the nmi watchdog.

The code in nmi_timer_int.c doesn't.

> In timer mode it does not use nmis or die notifiers, but relies on the 
> regular non nmi timer interrupt.

Again, the code in nmi_timer_int.c doesn't.

It uses the NMI watchdog timer interrupts, it catches DIE_NMI
events.

> Does that answer your question?

Not really.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-02-23  4:11   ` David Miller
@ 2009-02-23  4:52     ` Andi Kleen
  2009-02-23  5:59       ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2009-02-23  4:52 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel

> Again, the code in nmi_timer_int.c doesn't.
> 
> It uses the NMI watchdog timer interrupts, it catches DIE_NMI
> events.
> 
> > Does that answer your question?
> 
> Not really.

Ah see what you mean now. The nmi_timer_int code can only
be active ever when the cpu is not known to nmi_int.c and 
when the nmi watchdog is in io apic mode. But IO apic mode
doesn't use the fast check/slowdown because it always runs
at HZ frequency. That only happens in LAPIC mode.

The standard fallback mode for unknown CPU is the non NMI timer 
fallback in oprofile_init, the IO APIC mode happens near never in practice.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-02-23  4:52     ` Andi Kleen
@ 2009-02-23  5:59       ` David Miller
  2009-02-23  6:34         ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2009-02-23  5:59 UTC (permalink / raw)
  To: andi; +Cc: linux-kernel

From: Andi Kleen <andi@firstfloor.org>
Date: Mon, 23 Feb 2009 05:52:00 +0100

> > Again, the code in nmi_timer_int.c doesn't.
> > 
> > It uses the NMI watchdog timer interrupts, it catches DIE_NMI
> > events.
> > 
> > > Does that answer your question?
> > 
> > Not really.
> 
> Ah see what you mean now. The nmi_timer_int code can only
> be active ever when the cpu is not known to nmi_int.c and 
> when the nmi watchdog is in io apic mode. But IO apic mode
> doesn't use the fast check/slowdown because it always runs
> at HZ frequency. That only happens in LAPIC mode.
> 
> The standard fallback mode for unknown CPU is the non NMI timer 
> fallback in oprofile_init, the IO APIC mode happens near never in practice.

Look at the fallback logic, the pure NMI profiler can fail for
a number of reasons, not just because the watchdog is in
I/O APIC mode.

No matter what the failure reason, nmi_int.c is used.

And in some of those cases, the nmi_hz has been decreased to '1'.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86's nmi_hz wrt. oprofile's nmi_timer_int.c
  2009-02-23  5:59       ` David Miller
@ 2009-02-23  6:34         ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-23  6:34 UTC (permalink / raw)
  To: David Miller; +Cc: andi, linux-kernel

> Look at the fallback logic, the pure NMI profiler can fail for
> a number of reasons, not just because the watchdog is in
> I/O APIC mode.

I assume you mean nmi_init? It doesn't check for the watchdog at all,
just if it knows the CPU and if it can profile
All the reasons it fails on (unknown CPU, no APIC) will imply that the lapic 
based watchdog won't run either, because it relies on the same
perfctr hardware. The only case where it could fall into 
this path is in IO-APIC nmi watchdog mode (and unknown CPU) and then again 
the IO-APIC watchdog doesn't do the multiple frequencies thing, it always
runs with HZ.

Admittedly the logic is quite obscure.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-02-23  6:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-29 23:58 x86's nmi_hz wrt. oprofile's nmi_timer_int.c David Miller
2009-01-30 15:01 ` Ingo Molnar
2009-01-30 21:54   ` David Miller
2009-02-02 23:14     ` David Miller
2009-02-03 12:27       ` Ingo Molnar
2009-02-22 17:06 ` Andi Kleen
2009-02-23  4:11   ` David Miller
2009-02-23  4:52     ` Andi Kleen
2009-02-23  5:59       ` David Miller
2009-02-23  6:34         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox