public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3 v2] new nmi_watchdog using perf events
@ 2010-02-06  2:47 Don Zickus
  0 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2010-02-06  2:47 UTC (permalink / raw)
  To: mingo, peterz; +Cc: gorcunov, aris, linux-kernel, Don Zickus

This patch series tries to implement a new nmi_watchdog using the perf
events subsystem.  I am posting this series early for feedback on the
approach.  It isn't feature compatible with the old nmi_watchdog yet, nor
does it have all the old workarounds either.

The basic design is to create an in-kernel performance counter that goes off
every few seconds and checks for cpu lockups.  It is fairly straight
forward.  Some of the quirks are making sure the cpu lockup detection works
correctly.

It has been lightly tested for now.  Once people are ok with the approach,
I'll expand testing to more machines in our lab.

I tried taking a generic approach so all arches could use it if they want
and then implement some per arch specific hooks.  I believe this is what
Ingo was suggesting.  The initial work is based off of kernel/softlockup.c.

Any feedback would be great.

v2:
- moved a notify_die call into a #ifdef block
- used better default values for configuring the nmi_watchdog based on
  the old nmi_watchdog values

Cheers,
Don

--
damn it forgot to cc lkml

Don Zickus (3):
  [RFC][x86] move notify_die from nmi.c to traps.c
  [RFC] nmi_watchdog: new implementation using perf events
  [RFC] nmi_watchdog: config option to enable new nmi_watchdog

 arch/x86/kernel/apic/Makefile |    7 ++-
 arch/x86/kernel/apic/hw_nmi.c |  114 ++++++++++++++++++++++++
 arch/x86/kernel/apic/nmi.c    |    7 --
 arch/x86/kernel/traps.c       |    7 ++
 include/linux/nmi.h           |    4 +
 kernel/Makefile               |    1 +
 kernel/nmi_watchdog.c         |  196 +++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug             |   13 +++
 8 files changed, 341 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/kernel/apic/hw_nmi.c
 create mode 100644 kernel/nmi_watchdog.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3 v2] new nmi_watchdog using perf events
@ 2010-02-12 16:12 Stephane Eranian
  2010-02-12 16:59 ` Don Zickus
  0 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2010-02-12 16:12 UTC (permalink / raw)
  To: Don Zickus; +Cc: LKML, Peter Zijlstra, mingo, Paul Mackerras, Robert Richter

Don,

How is this new NMI watchdog code going to work when you also have OProfile
enabled in your kernel?

Today, perf_event disables the NMI watchdog while there is at least one event.
By releasing the PMU registers, it also allows for Oprofile to work.

But now with this new NMI watchdog code, perf_event never releases the PMU.
Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3 v2] new nmi_watchdog using perf events
  2010-02-12 16:12 [PATCH 0/3 v2] new nmi_watchdog using perf events Stephane Eranian
@ 2010-02-12 16:59 ` Don Zickus
  2010-02-12 17:12   ` Stephane Eranian
  0 siblings, 1 reply; 6+ messages in thread
From: Don Zickus @ 2010-02-12 16:59 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, Paul Mackerras, Robert Richter

On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
> Don,
> 
> How is this new NMI watchdog code going to work when you also have OProfile
> enabled in your kernel?
> 
> Today, perf_event disables the NMI watchdog while there is at least one event.
> By releasing the PMU registers, it also allows for Oprofile to work.
> 
> But now with this new NMI watchdog code, perf_event never releases the PMU.
> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.

You are right.  Originally when I read the code I thought perf_event just
grabbed all the PMUs in reserve_pmc_init().  But I see that only happens
when someone actually creates a PERF_TYPE_HARDWARE event, which the new
nmi watchdog does.  Those PMUs only get released when the event is
destroyed which my new code only does when the cpu disappears.

So yeah, I have effectively blocked oprofile from working.  I can change
my code such that when you disable the nmi_watchdog, you can release the
PMUs and let oprofile work.

But then I am curious, considering that perf and oprofile do the same
thing, how much longer do we let competing subsystems control the same
hardware?  I thought the point of the perf_event subsystem was to have a
proper framework on top of the PMUs such that anyone who wants to use it
just registers themselves, which is what the new nmi_watchdog is doing.

I can add code that allows oprofile and the new nmi watchdog to coexist,
but things get a little ugly to maintain.  Just wondering what the
gameplan is here?

Cheers,
Don


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3 v2] new nmi_watchdog using perf events
  2010-02-12 16:59 ` Don Zickus
@ 2010-02-12 17:12   ` Stephane Eranian
  2010-02-14  8:31     ` Ingo Molnar
  2010-02-15 20:04     ` Robert Richter
  0 siblings, 2 replies; 6+ messages in thread
From: Stephane Eranian @ 2010-02-12 17:12 UTC (permalink / raw)
  To: Don Zickus; +Cc: LKML, Peter Zijlstra, mingo, Paul Mackerras, Robert Richter

Don,

On Fri, Feb 12, 2010 at 5:59 PM, Don Zickus <dzickus@redhat.com> wrote:
> On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
>> Don,
>>
>> How is this new NMI watchdog code going to work when you also have OProfile
>> enabled in your kernel?
>>
>> Today, perf_event disables the NMI watchdog while there is at least one event.
>> By releasing the PMU registers, it also allows for Oprofile to work.
>>
>> But now with this new NMI watchdog code, perf_event never releases the PMU.
>> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
>> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.
>
> You are right.  Originally when I read the code I thought perf_event just
> grabbed all the PMUs in reserve_pmc_init().  But I see that only happens
> when someone actually creates a PERF_TYPE_HARDWARE event, which the new
> nmi watchdog does.  Those PMUs only get released when the event is
> destroyed which my new code only does when the cpu disappears.
>
> So yeah, I have effectively blocked oprofile from working.  I can change
> my code such that when you disable the nmi_watchdog, you can release the
> PMUs and let oprofile work.
>
> But then I am curious, considering that perf and oprofile do the same
> thing, how much longer do we let competing subsystems control the same
> hardware?  I thought the point of the perf_event subsystem was to have a
> proper framework on top of the PMUs such that anyone who wants to use it
> just registers themselves, which is what the new nmi_watchdog is doing.
>
> I can add code that allows oprofile and the new nmi watchdog to coexist,
> but things get a little ugly to maintain.  Just wondering what the
> gameplan is here?
>
I believe OProfile should eventually be removed from the kernel. I suspect
much of the functionalities it needs are already provided by perf_events.
But that does not mean the OProfile user level tool must disappear. There is
a very large user community. I think it could and should be ported to use
perf_events instead. Given that the Oprofile users only interact through
opcontrol, opreport, opannotate and such, they never "see" the actual kernel
API. Thus by re-targeting the scripts, this should be mostly transparent to
end-users.

But for now, I believe the most practical solution is to release the perf_event
event when you disable the NMI watchdog. That would at least provide a
way to run OProfile.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3 v2] new nmi_watchdog using perf events
  2010-02-12 17:12   ` Stephane Eranian
@ 2010-02-14  8:31     ` Ingo Molnar
  2010-02-15 20:04     ` Robert Richter
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2010-02-14  8:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Don Zickus, LKML, Peter Zijlstra, Paul Mackerras, Robert Richter,
	Arnaldo Carvalho de Melo, Thomas Gleixner


* Stephane Eranian <eranian@google.com> wrote:

> Don,
> 
> On Fri, Feb 12, 2010 at 5:59 PM, Don Zickus <dzickus@redhat.com> wrote:
> > On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
> >> Don,
> >>
> >> How is this new NMI watchdog code going to work when you also have OProfile
> >> enabled in your kernel?
> >>
> >> Today, perf_event disables the NMI watchdog while there is at least one event.
> >> By releasing the PMU registers, it also allows for Oprofile to work.
> >>
> >> But now with this new NMI watchdog code, perf_event never releases the PMU.
> >> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
> >> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.
> >
> > You are right. ??Originally when I read the code I thought perf_event just
> > grabbed all the PMUs in reserve_pmc_init(). ??But I see that only happens
> > when someone actually creates a PERF_TYPE_HARDWARE event, which the new
> > nmi watchdog does. ??Those PMUs only get released when the event is
> > destroyed which my new code only does when the cpu disappears.
> >
> > So yeah, I have effectively blocked oprofile from working. ??I can change
> > my code such that when you disable the nmi_watchdog, you can release the
> > PMUs and let oprofile work.
> >
> > But then I am curious, considering that perf and oprofile do the same
> > thing, how much longer do we let competing subsystems control the same
> > hardware? ??I thought the point of the perf_event subsystem was to have a
> > proper framework on top of the PMUs such that anyone who wants to use it
> > just registers themselves, which is what the new nmi_watchdog is doing.
> >
> > I can add code that allows oprofile and the new nmi watchdog to coexist,
> > but things get a little ugly to maintain. ??Just wondering what the
> > gameplan is here?
>
> I believe OProfile should eventually be removed from the kernel. I suspect 
> much of the functionalities it needs are already provided by perf_events. 
> But that does not mean the OProfile user level tool must disappear. There 
> is a very large user community. I think it could and should be ported to 
> use perf_events instead. Given that the Oprofile users only interact 
> through opcontrol, opreport, opannotate and such, they never "see" the 
> actual kernel API. Thus by re-targeting the scripts, this should be mostly 
> transparent to end-users.

Agreed, i suggested this a few months ago. (In theory the oprofile user-space 
tooling might even use something like libpapi or the perfmon library, instead 
of directly binding to the perf syscall.)

Until that is done we can keep the parallel oprofile infrastructure i guess, 
as a legacy option.

Then again, when it comes to profiling, i cannot see anyone not wanting to 
use perf itself, it's so much easier to use (and more capable) than the 
oprofile tooling in pretty much any area. Basically all instrumentation and 
profiling development happens in that space these days as well.

Those few places where you cannot use perf yet (on weird or old CPUs) are 
being filled in quickly as well - so it might be easier to just widen PMU 
support to all relevant places instead of spending effort on oprofile 
compatibility.

> But for now, I believe the most practical solution is to release the 
> perf_event event when you disable the NMI watchdog. That would at least 
> provide a way to run OProfile.

Agreed.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3 v2] new nmi_watchdog using perf events
  2010-02-12 17:12   ` Stephane Eranian
  2010-02-14  8:31     ` Ingo Molnar
@ 2010-02-15 20:04     ` Robert Richter
  1 sibling, 0 replies; 6+ messages in thread
From: Robert Richter @ 2010-02-15 20:04 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Don Zickus, LKML, Peter Zijlstra, mingo, Paul Mackerras

On 12.02.10 18:12:47, Stephane Eranian wrote:
> Don,
> 
> On Fri, Feb 12, 2010 at 5:59 PM, Don Zickus <dzickus@redhat.com> wrote:
> > On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
> >> Don,
> >>
> >> How is this new NMI watchdog code going to work when you also have OProfile
> >> enabled in your kernel?
> >>
> >> Today, perf_event disables the NMI watchdog while there is at least one event.
> >> By releasing the PMU registers, it also allows for Oprofile to work.
> >>
> >> But now with this new NMI watchdog code, perf_event never releases the PMU.
> >> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
> >> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.
> >
> > You are right.  Originally when I read the code I thought perf_event just
> > grabbed all the PMUs in reserve_pmc_init().  But I see that only happens
> > when someone actually creates a PERF_TYPE_HARDWARE event, which the new
> > nmi watchdog does.  Those PMUs only get released when the event is
> > destroyed which my new code only does when the cpu disappears.
> >
> > So yeah, I have effectively blocked oprofile from working.  I can change
> > my code such that when you disable the nmi_watchdog, you can release the
> > PMUs and let oprofile work.
> >
> > But then I am curious, considering that perf and oprofile do the same
> > thing, how much longer do we let competing subsystems control the same
> > hardware?  I thought the point of the perf_event subsystem was to have a
> > proper framework on top of the PMUs such that anyone who wants to use it
> > just registers themselves, which is what the new nmi_watchdog is doing.

There is the perfctr reservation framework what is used by all
subsystems. Perf reserves all counters if there is one event actively
running. This is ok as long you use perf from the userspace for
profiling. Nobody uses 2 different profilers at the same time. But if
the counters are also for implementing in-kernel features such as a
watchdog that is enabled all the time, perf must be modified to only
allocate those counters that are actually needed, and events may not
be scheduled on counters that are already reserved.

> > I can add code that allows oprofile and the new nmi watchdog to coexist,
> > but things get a little ugly to maintain.  Just wondering what the
> > gameplan is here?

There is no longer kernel feature implementation for oprofile. But it
will be still in the kernel for a while until we can completely switch
to perf. Perf is improving very fast, compared to the ongoing
development the implementation effort for coexistence is small. So I
think we all can spend some time to also improve the counter
reservation code.

> I believe OProfile should eventually be removed from the kernel. I suspect
> much of the functionalities it needs are already provided by perf_events.
> But that does not mean the OProfile user level tool must disappear. There is
> a very large user community. I think it could and should be ported to use
> perf_events instead. Given that the Oprofile users only interact through
> opcontrol, opreport, opannotate and such, they never "see" the actual kernel
> API. Thus by re-targeting the scripts, this should be mostly transparent to
> end-users.

I think, porting the oprofile userland to work on top of a performance
library (libpapi or libpfm) would be the cleanest solution. Alternativly
we could also port the kernel part to use the in-kernel perf api.

> 
> But for now, I believe the most practical solution is to release the perf_event
> event when you disable the NMI watchdog. That would at least provide a
> way to run OProfile.

This solution is fine to me. The current implemenation also has some
limitations for oprofile if the watchdog is enabled.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-02-15 20:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-12 16:12 [PATCH 0/3 v2] new nmi_watchdog using perf events Stephane Eranian
2010-02-12 16:59 ` Don Zickus
2010-02-12 17:12   ` Stephane Eranian
2010-02-14  8:31     ` Ingo Molnar
2010-02-15 20:04     ` Robert Richter
  -- strict thread matches above, loose matches on Subject: below --
2010-02-06  2:47 Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox