public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* introduce NMI_AUTO as nmi_watchdog option
@ 2010-01-11 19:16 Don Zickus
  2010-01-11 20:27 ` Cyrill Gorcunov
  0 siblings, 1 reply; 10+ messages in thread
From: Don Zickus @ 2010-01-11 19:16 UTC (permalink / raw)
  To: mingo; +Cc: aris, linux-kernel

Hi Ingo,

To dig up an old thread last November:

======
* Aristeu Rozanski <aris@redhat.com> wrote:

> > > > > NMI_AUTO is a new nmi_watchdog option that makes LAPIC be tried
> > > > > first 
> > > > > and if the CPU isn't supported, IOAPIC will be used. It's useful
> > > > > in 
> > > > > cases where NMI watchdog is enabled by default in a kernel built
> > > > > for 
> > > > > different machines. It can be configured by default or selected
> > > > > with 
> > > > > nmi_watchdog=3 or nmi_watchdog=auto parameters.
> > > > 
> > > > What i'd like to see for the NMI watchdog is much more ambitious
> > > > than 
> > > > this: the use of perf events to run a periodic NMI callback.
> > > > 
> > > > The NMI watchdog would cause the creation of a per-cpu perf_event 
> > > > structure (in-kernel). All x86 CPUs that have perf event support
> > > > (the 
		> > > > majority of them) will thus be able to have an NMI
> > > > watchdog using a 
> > > > nice, generic piece of code and we'd be able to phase out the
> > > > open-coded 
> > > > NMI watchdog code.
> > > > 
> > > > The user would not notice much from this: we'd still have the 
> > > > /proc/sys/kernel/nmi_watchdog toggle to turn it on/off, and we'd
> > > > still 
> > > > have the nmi_watchog= boot parameter as well. But the underlying 
> > > > implementation would be far more generic and far more usable than
> > > > the 
> > > > current code.
> > > > 
> > > > Would you be interested in moving the NMI watchdog code in this 
> > > > direction? Most of the perf events changes (callbacks, helpers for 
		> > > > in-kernel event allocations, etc.) are in latest
> > > > -tip already, so you 
> > > > could use that as a base.
> > >
> > > but that would work only for LAPIC. You're suggesting killing IOAPIC 
> > > mode too?
> > 
> > Would it be a big loss, with all modern systems expected to have a 
> > working lapic based NMI source? I wrote the IOAPIC mode originally but
> > i 
> > dont feel too attached to it ;-)
>
> ok, fair enough. but since it'll be another implementation, do you 
> mind applying the patches I submitted so they can be used until the 
> new implementation is in place?

For that i need to see at least an RFC v1 version series of the new 
implementation - otherwise we might end up sitting on this interim 
version with no-one doing the better variant.

========

I was going to jump in and try to do this work.  I wanted to make sure
what you were looking for here.  When you say convert nmi watchdog to perf
events, I assume you mean merging over the bits of perfctr-watchdog.c to
perf_events.c, modify nmi.c to just register as a normal perf event and
probably cleanup the oprofile stuff to match, correct?

Cheers,
Don


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-11 19:16 introduce NMI_AUTO as nmi_watchdog option Don Zickus
@ 2010-01-11 20:27 ` Cyrill Gorcunov
  2010-01-11 20:33   ` Don Zickus
  0 siblings, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2010-01-11 20:27 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, aris, linux-kernel

On Mon, Jan 11, 2010 at 02:16:33PM -0500, Don Zickus wrote:
> Hi Ingo,
> 
...
> I was going to jump in and try to do this work.  I wanted to make sure
> what you were looking for here.  When you say convert nmi watchdog to perf
> events, I assume you mean merging over the bits of perfctr-watchdog.c to
> perf_events.c, modify nmi.c to just register as a normal perf event and
> probably cleanup the oprofile stuff to match, correct?
> 
> Cheers,
> Don
>

As far as I know -- converting perfctr-watchdog.c to into perfevents
style would be quite a desirable feature. But I still didn't manage to
find time for this task :( If you're interested to start this work
-- that would be just great!

	-- Cyrill

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-11 20:27 ` Cyrill Gorcunov
@ 2010-01-11 20:33   ` Don Zickus
  2010-01-11 20:51     ` Cyrill Gorcunov
  2010-01-13  9:32     ` Ingo Molnar
  0 siblings, 2 replies; 10+ messages in thread
From: Don Zickus @ 2010-01-11 20:33 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: mingo, aris, linux-kernel

On Mon, Jan 11, 2010 at 11:27:29PM +0300, Cyrill Gorcunov wrote:
> On Mon, Jan 11, 2010 at 02:16:33PM -0500, Don Zickus wrote:
> > Hi Ingo,
> > 
> ...
> > I was going to jump in and try to do this work.  I wanted to make sure
> > what you were looking for here.  When you say convert nmi watchdog to perf
> > events, I assume you mean merging over the bits of perfctr-watchdog.c to
> > perf_events.c, modify nmi.c to just register as a normal perf event and
> > probably cleanup the oprofile stuff to match, correct?
> > 
> > Cheers,
> > Don
> >
> 
> As far as I know -- converting perfctr-watchdog.c to into perfevents
> style would be quite a desirable feature. But I still didn't manage to
> find time for this task :( If you're interested to start this work
> -- that would be just great!

After looking through the code I just had some questions, perhaps you have
thought about this longer than me, what to do with the reservation code
(just remove it I assume and let perf_events _be_ the only code that
 handles perf events) and what to do with some of the cpu quirks as noted
in perfctr-watchdog.c (notable some of the Intel errata for the Core
chipsets).

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-11 20:33   ` Don Zickus
@ 2010-01-11 20:51     ` Cyrill Gorcunov
  2010-01-13  9:32     ` Ingo Molnar
  1 sibling, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2010-01-11 20:51 UTC (permalink / raw)
  To: Don Zickus; +Cc: mingo, aris, linux-kernel, Frederic Weisbecker

On Mon, Jan 11, 2010 at 03:33:56PM -0500, Don Zickus wrote:
> On Mon, Jan 11, 2010 at 11:27:29PM +0300, Cyrill Gorcunov wrote:
> > On Mon, Jan 11, 2010 at 02:16:33PM -0500, Don Zickus wrote:
> > > Hi Ingo,
> > > 
> > ...
> > > I was going to jump in and try to do this work.  I wanted to make sure
> > > what you were looking for here.  When you say convert nmi watchdog to perf
> > > events, I assume you mean merging over the bits of perfctr-watchdog.c to
> > > perf_events.c, modify nmi.c to just register as a normal perf event and
> > > probably cleanup the oprofile stuff to match, correct?
> > > 
> > > Cheers,
> > > Don
> > >
> > 
> > As far as I know -- converting perfctr-watchdog.c to into perfevents
> > style would be quite a desirable feature. But I still didn't manage to
> > find time for this task :( If you're interested to start this work
> > -- that would be just great!
> 
> After looking through the code I just had some questions, perhaps you have
> thought about this longer than me, what to do with the reservation code
> (just remove it I assume and let perf_events _be_ the only code that
>  handles perf events) and what to do with some of the cpu quirks as noted
> in perfctr-watchdog.c (notable some of the Intel errata for the Core
> chipsets).
> 
> Cheers,
> Don
> 

Hi Don,

well I must admit I didn't look too close to this code (if I had I would
have sent some patch for review at least :). But I was suggested to take
a look on hw_breakpoint.c (Frederic worked on it iirc, CC'ed) as an example
of perfevent'ed code. So converting to perf-event is not trivial task
and I fear I can't give any useful advice at moment since as I said I
didn't manage to find time for this task and as result didn't read code
byte-to-byte, sorry. But if I get some idea -- will share!

	-- Cyrill

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-11 20:33   ` Don Zickus
  2010-01-11 20:51     ` Cyrill Gorcunov
@ 2010-01-13  9:32     ` Ingo Molnar
  2010-01-13 13:13       ` Peter Zijlstra
  2010-01-13 16:23       ` Don Zickus
  1 sibling, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2010-01-13  9:32 UTC (permalink / raw)
  To: Don Zickus; +Cc: Cyrill Gorcunov, aris, linux-kernel


* Don Zickus <dzickus@redhat.com> wrote:

> On Mon, Jan 11, 2010 at 11:27:29PM +0300, Cyrill Gorcunov wrote:
> > On Mon, Jan 11, 2010 at 02:16:33PM -0500, Don Zickus wrote:
> > > Hi Ingo,
> > > 
> > ...
> > > I was going to jump in and try to do this work.  I wanted to make sure
> > > what you were looking for here.  When you say convert nmi watchdog to perf
> > > events, I assume you mean merging over the bits of perfctr-watchdog.c to
> > > perf_events.c, modify nmi.c to just register as a normal perf event and
> > > probably cleanup the oprofile stuff to match, correct?
> > > 
> > > Cheers,
> > > Don
> > >
> > 
> > As far as I know -- converting perfctr-watchdog.c to into perfevents
> > style would be quite a desirable feature. But I still didn't manage to
> > find time for this task :( If you're interested to start this work
> > -- that would be just great!
> 
> After looking through the code I just had some questions, perhaps you have 
> thought about this longer than me, what to do with the reservation code 
> (just remove it I assume and let perf_events _be_ the only code that
>  handles perf events) and what to do with some of the cpu quirks as noted in 
> perfctr-watchdog.c (notable some of the Intel errata for the Core chipsets).

Given the amount of quirks in the perctr code it might make sense to shape 
this as a new feature initially: introduce a new NMI watchdog that is perf 
based and has a different codebase.

Then, once it's capable enough and has been in circulation long enough we can 
simply drop the old NMI watchdog. (without users noticing anything [modulo 
bugs])

v1 should concentrate on x86 CPUs that are supported by perf currently. Note, 
it _might_ make sense to do it via a new kernel/nmi_watchdog.c file - other 
architectures have NMI concepts as well, such as Sparc64. A further idea would 
be to maybe even merge it with the softlockup code in kernel/softlockup.c - so 
that we dont have two sets of apis like touch_nmi_watchdog and 
touch_softlockup_watchdog.

So there's a wide spectrum of possibilities - the important thing is to start 
small :-)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-13  9:32     ` Ingo Molnar
@ 2010-01-13 13:13       ` Peter Zijlstra
  2010-01-13 16:25         ` Don Zickus
  2010-01-13 16:35         ` Ingo Molnar
  2010-01-13 16:23       ` Don Zickus
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2010-01-13 13:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Don Zickus, Cyrill Gorcunov, aris, linux-kernel

On Wed, 2010-01-13 at 10:32 +0100, Ingo Molnar wrote:
> other architectures have NMI concepts as well, such as Sparc64. 

I think both sparc64 and ppc64 fake NMIs by playing games with hw IRQ
priorities and partial masks. But yes.

One interesting 'feature' for the perf-nmi interaction is creating an
idle scheduling class for counters, because as long as there is a
counter present you can use his NMIs to drive the watchdog, but as soon
as there are non left, you need to install one.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-13  9:32     ` Ingo Molnar
  2010-01-13 13:13       ` Peter Zijlstra
@ 2010-01-13 16:23       ` Don Zickus
  1 sibling, 0 replies; 10+ messages in thread
From: Don Zickus @ 2010-01-13 16:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Cyrill Gorcunov, aris, linux-kernel

On Wed, Jan 13, 2010 at 10:32:40AM +0100, Ingo Molnar wrote:
> > After looking through the code I just had some questions, perhaps you have 
> > thought about this longer than me, what to do with the reservation code 
> > (just remove it I assume and let perf_events _be_ the only code that
> >  handles perf events) and what to do with some of the cpu quirks as noted in 
> > perfctr-watchdog.c (notable some of the Intel errata for the Core chipsets).
> 
> Given the amount of quirks in the perctr code it might make sense to shape 
> this as a new feature initially: introduce a new NMI watchdog that is perf 
> based and has a different codebase.
> 
> Then, once it's capable enough and has been in circulation long enough we can 
> simply drop the old NMI watchdog. (without users noticing anything [modulo 
> bugs])
> 
> v1 should concentrate on x86 CPUs that are supported by perf currently. Note, 
> it _might_ make sense to do it via a new kernel/nmi_watchdog.c file - other 
> architectures have NMI concepts as well, such as Sparc64. A further idea would 
> be to maybe even merge it with the softlockup code in kernel/softlockup.c - so 
> that we dont have two sets of apis like touch_nmi_watchdog and 
> touch_softlockup_watchdog.

Ok, interesting.  Right now I am working on making sure I know how to
register something with the perf event framework (from kernel space).
Once I can do that, I'll expand it outward and see where it goes. :-)

> 
> So there's a wide spectrum of possibilities - the important thing is to start 
> small :-)

I see. Thanks.

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-13 13:13       ` Peter Zijlstra
@ 2010-01-13 16:25         ` Don Zickus
  2010-01-13 16:42           ` Peter Zijlstra
  2010-01-13 16:35         ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Don Zickus @ 2010-01-13 16:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Cyrill Gorcunov, aris, linux-kernel

On Wed, Jan 13, 2010 at 02:13:42PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-01-13 at 10:32 +0100, Ingo Molnar wrote:
> > other architectures have NMI concepts as well, such as Sparc64. 
> 
> I think both sparc64 and ppc64 fake NMIs by playing games with hw IRQ
> priorities and partial masks. But yes.
> 
> One interesting 'feature' for the perf-nmi interaction is creating an
> idle scheduling class for counters, because as long as there is a
> counter present you can use his NMIs to drive the watchdog, but as soon
> as there are non left, you need to install one.

Interesting idea.  How can I guarantee the frequency of the NMI I want to
piggyback off of?  A breakpoint that takes an hour to trigger may not be
the best NMI to use?  Then again I am still trying to understand the perf
event code a little better.

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-13 13:13       ` Peter Zijlstra
  2010-01-13 16:25         ` Don Zickus
@ 2010-01-13 16:35         ` Ingo Molnar
  1 sibling, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2010-01-13 16:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Don Zickus, Cyrill Gorcunov, aris, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-01-13 at 10:32 +0100, Ingo Molnar wrote:
> > other architectures have NMI concepts as well, such as Sparc64. 
> 
> I think both sparc64 and ppc64 fake NMIs by playing games with hw IRQ 
> priorities and partial masks. But yes.
> 
> One interesting 'feature' for the perf-nmi interaction is creating an idle 
> scheduling class for counters, because as long as there is a counter present 
> you can use his NMIs to drive the watchdog, but as soon as there are non 
> left, you need to install one.

Yeah. I'd suggest to not complicate things with that initially - but to simply 
create a standalone event for it and 'waste' a counter on NMI generation. 

Later on it can indeed be a good feature to make the NMI watchdog 'seemless' 
in the sense of it not causing any wasted hw resources - it can piggyback on 
any existing NMI event. (as long as that event is at least ~1 HZ strong or so)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: introduce NMI_AUTO as nmi_watchdog option
  2010-01-13 16:25         ` Don Zickus
@ 2010-01-13 16:42           ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2010-01-13 16:42 UTC (permalink / raw)
  To: Don Zickus; +Cc: Ingo Molnar, Cyrill Gorcunov, aris, linux-kernel

On Wed, 2010-01-13 at 11:25 -0500, Don Zickus wrote:
> On Wed, Jan 13, 2010 at 02:13:42PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-01-13 at 10:32 +0100, Ingo Molnar wrote:
> > > other architectures have NMI concepts as well, such as Sparc64. 
> > 
> > I think both sparc64 and ppc64 fake NMIs by playing games with hw IRQ
> > priorities and partial masks. But yes.
> > 
> > One interesting 'feature' for the perf-nmi interaction is creating an
> > idle scheduling class for counters, because as long as there is a
> > counter present you can use his NMIs to drive the watchdog, but as soon
> > as there are non left, you need to install one.
> 
> Interesting idea.  How can I guarantee the frequency of the NMI I want to
> piggyback off of?  A breakpoint that takes an hour to trigger may not be
> the best NMI to use?  Then again I am still trying to understand the perf
> event code a little better.

You could play games with the period, we can handle getting more NMIs
than are needed. This is how we implement a period larger than the
physical counter for example.

But yeah, its a tricky game since a tight loop might never generate the
event we're counting.. we could limit this to things like
cycles/ins/bus-cycles etc.. those will always tick.

Anyway, its all an optimization, the simple/first implementation would
simply install a kernel cpu perf counter and hook the overflow handler.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-01-13 16:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-11 19:16 introduce NMI_AUTO as nmi_watchdog option Don Zickus
2010-01-11 20:27 ` Cyrill Gorcunov
2010-01-11 20:33   ` Don Zickus
2010-01-11 20:51     ` Cyrill Gorcunov
2010-01-13  9:32     ` Ingo Molnar
2010-01-13 13:13       ` Peter Zijlstra
2010-01-13 16:25         ` Don Zickus
2010-01-13 16:42           ` Peter Zijlstra
2010-01-13 16:35         ` Ingo Molnar
2010-01-13 16:23       ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox