[PATCH] perf wrong branches event on AMD

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] perf wrong branches event on AMD
@ 2010-07-01 19:30 Vince Weaver
  2010-07-01 19:54 ` Arnaldo Carvalho de Melo
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Vince Weaver @ 2010-07-01 19:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo

Hello

while doing some performance counter validation tests on some assembly 
language programs I noticed that the "branches:u" count was very wrong on 
AMD machines.

It looks like the wrong event was selected.

This is why event selection needs to be in user-space... it could be fixed 
instantly there, but the way things are done now it will take months to 
years for this fix to filter down to those trying to use perf counters...

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 611df11..c2897b7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -102,8 +102,8 @@ static const u64 amd_perfmon_event_map[] =
   [PERF_COUNT_HW_INSTRUCTIONS]		= 0x00c0,
   [PERF_COUNT_HW_CACHE_REFERENCES]	= 0x0080,
   [PERF_COUNT_HW_CACHE_MISSES]		= 0x0081,
-  [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= 0x00c4,
-  [PERF_COUNT_HW_BRANCH_MISSES]		= 0x00c5,
+  [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= 0x00c2,
+  [PERF_COUNT_HW_BRANCH_MISSES]		= 0x00c3,
 };
 
 static u64 amd_pmu_event_map(int hw_event)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-01 19:30 [PATCH] perf wrong branches event on AMD Vince Weaver
@ 2010-07-01 19:54 ` Arnaldo Carvalho de Melo
  2010-07-02 11:38 ` Peter Zijlstra
  2010-07-03 13:58 ` [tip:perf/urgent] perf, x86: Fix incorrect branches event on AMD CPUs tip-bot for Vince Weaver
  2 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-07-01 19:54 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, LKML, Peter Zijlstra, Paul Mackerras

Em Thu, Jul 01, 2010 at 03:30:16PM -0400, Vince Weaver escreveu:
> Hello
> 
> while doing some performance counter validation tests on some assembly 
> language programs I noticed that the "branches:u" count was very wrong on 
> AMD machines.
> 
> It looks like the wrong event was selected.
> 
> This is why event selection needs to be in user-space... it could be fixed 
> instantly there, but the way things are done now it will take months to 
> years for this fix to filter down to those trying to use perf counters...

Well, you can use:

  rNNN (see 'perf list --help' on how to encode it) [Raw hardware event descriptor]

In the meantime.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-01 19:30 [PATCH] perf wrong branches event on AMD Vince Weaver
  2010-07-01 19:54 ` Arnaldo Carvalho de Melo
@ 2010-07-02 11:38 ` Peter Zijlstra
  2010-07-02 13:56   ` Vince Weaver
  2010-07-03 13:58 ` [tip:perf/urgent] perf, x86: Fix incorrect branches event on AMD CPUs tip-bot for Vince Weaver
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2010-07-02 11:38 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, LKML, Paul Mackerras, Arnaldo Carvalho de Melo

On Thu, 2010-07-01 at 15:30 -0400, Vince Weaver wrote:
> This is why event selection needs to be in user-space... it could be fixed 
> instantly there, but the way things are done now it will take months to 
> years for this fix to filter down to those trying to use perf counters...

Last time I checked apt-get upgrade/yum upgrade simply upgraded
everything, including kernels.. (and upgrades to userspace packages can
take months too)

Someone needs to build a new package and publish it, upgrading the
kernel is no harder than upgrading any other.

If you don't want to reboot, there's the -r option Arnaldo already
mentioned. There's also the option of writing a kernel module to poke at
the data table if you really really want to update a running kernel.

If you think this is a really "important" feature you could even make a
patch that exposes all these data tables to userspace through sysfs or
whatever and see if people think its worth the effort.

Personally I don't think people will ever use such a sysfs interface,
but hey, that's my opinion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-02 11:38 ` Peter Zijlstra
@ 2010-07-02 13:56   ` Vince Weaver
  2010-07-02 14:23     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Vince Weaver @ 2010-07-02 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Paul Mackerras, Arnaldo Carvalho de Melo

On Fri, 2 Jul 2010, Peter Zijlstra wrote:

> On Thu, 2010-07-01 at 15:30 -0400, Vince Weaver wrote:
> > This is why event selection needs to be in user-space... it could be fixed 
> > instantly there, but the way things are done now it will take months to 
> > years for this fix to filter down to those trying to use perf counters...
> 
> Last time I checked apt-get upgrade/yum upgrade simply upgraded
> everything, including kernels.. (and upgrades to userspace packages can
> take months too)

The system I have this problem on is a large server used by many people, 
and has some fiddly hardware.  The admins don't take kernel upgrades 
lightly.

User-space libraries can be installed in my home directory, by me, with 
no changes to anyone else using the system.

> If you don't want to reboot, there's the -r option Arnaldo already
> mentioned. There's also the option of writing a kernel module to poke at
> the data table if you really really want to update a running kernel.

You think I have root on this machine?

In any case, yes, there's the "-r" option.  Fun.  I get to modify all my 
scripts to have some complicated case... "if AMD machine and if kernel is 
new enough"... how new?  It gets confusing once things get backported to 
stable.  As far as I know there's no way to get a kernel to spit out what 
raw events it's using for the predefined events.

Plus, the branches:u result is giving a *wrong* event with wrong values, 
not just a 0 count which might be suspicious.

If the solution really is to use raw events in a case like this, I 
question why the predefined events are in the kernel at all.  Pretty much 
anyone using the braches event on an AMD machine is going to be getting 
wrong results for all kernels between 2.6.31 and 2.6.35 and not even know 
it unless they read the kernel list.

> If you think this is a really "important" feature you could even make a
> patch that exposes all these data tables to userspace through sysfs or
> whatever and see if people think its worth the effort.

such a library already exists, called libpfm4.  I use it when I can.

Unfortunately perf is widely used and comes with the kernel.

Vince

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-02 13:56   ` Vince Weaver
@ 2010-07-02 14:23     ` Peter Zijlstra
  2010-07-02 19:52       ` Vince Weaver
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2010-07-02 14:23 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Ingo Molnar, LKML, Paul Mackerras, Arnaldo Carvalho de Melo

On Fri, 2010-07-02 at 09:56 -0400, Vince Weaver wrote:
> You think I have root on this machine?

Well yeah,.. I'd not want a dev job and not have full access to the
hardware. But then, maybe I'm picky.

> In any case, yes, there's the "-r" option.  Fun.  I get to modify all my 
> scripts to have some complicated case... "if AMD machine and if kernel is 
> new enough"... how new?  It gets confusing once things get backported to 
> stable.  As far as I know there's no way to get a kernel to spit out what 
> raw events it's using for the predefined events.

You can stick the knowledge in perf if you really want to.. something
like the below, add something that parses cpuid or /proc/cpuinfo and you
should be good.

---
 tools/perf/util/parse-events.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c
b/tools/perf/util/parse-events.c
index 4af5bd5..800f864 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -771,6 +771,22 @@ parse_event_symbols(const char **str, struct
perf_event_attr *attr)
 modifier:
 	parse_event_modifier(str, attr);
 
+	if (attr->type == PERF_TYPE_HARDWARE) {
+		switch (attr->config) {
+		case PERF_COUNT_HW_BRANCH_INSTRUCTIONS:
+			attr->type = PERF_TYPE_RAW;
+			attr->config = 0x00c2;
+			break;
+		case PERF_COUNT_HW_BRANCH_MISSES:
+			attr->type = PERF_TYPE_RAW;
+			attr->config = 0x00c3;
+			break;
+
+		default:
+			break;
+		}
+	}
+
 	return ret;
 }
 
> If the solution really is to use raw events in a case like this, I 
> question why the predefined events are in the kernel at all.  Pretty much 
> anyone using the braches event on an AMD machine is going to be getting 
> wrong results for all kernels between 2.6.31 and 2.6.35 and not even know 
> it unless they read the kernel list. 

And how would it be different if it the data table lived in userspace?
They'd still get the wrong thing unless they updated.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-02 14:23     ` Peter Zijlstra
@ 2010-07-02 19:52       ` Vince Weaver
  2010-07-03 13:54         ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Vince Weaver @ 2010-07-02 19:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Paul Mackerras, Arnaldo Carvalho de Melo

On Fri, 2 Jul 2010, Peter Zijlstra wrote:

> On Fri, 2010-07-02 at 09:56 -0400, Vince Weaver wrote:
> > You think I have root on this machine?
> 
> Well yeah,.. I'd not want a dev job and not have full access to the
> hardware. But then, maybe I'm picky.

I can see how this support call would go now.

  Me:  Hello, I need you to upgrade the kernel on the
       2.332 petaflop machine with 37,376 processors 
       so I can have the right branch counter on perf.
  Them: Umm... no.
  Me:  Well then can I have root so I can patch
       the kernel on the fly?
  Them: <click>

As a performance counter library developer, it is a bit frustrating having 
to keep a compatibility matrix in my head of all the perf events 
shortcomings.  Especially since the users tend not to have admin
access on their machines.  Need to have at least 2.6.33 if you want
multiplexing.  Need to have 2.6.34 if you want Nehalem-EX.  Need 2.6.35
if you want Pentium 4.  Unfortunately most vendor kernels are stuck at 
2.6.32 :(  Now I'll have to remember whatever kenel the AMD branches
fix is committed at.  And there still isn't the Uncore support that 
everyone is clamoring for.

> You can stick the knowledge in perf if you really want to.. something
> like the below, add something that parses cpuid or /proc/cpuinfo and you
> should be good.

again though, doesn't this defeat the purpose of the whole idea of common 
named events?

> And how would it be different if it the data table lived in userspace?
> They'd still get the wrong thing unless they updated.

because compiling and running an updated user-version of a library is 
possible.  You can compile it in your home directory and link your tools 
against it.  No need to bug the sysadmin.  You can even use LD_PRELOAD
to get other apps to link against it.  Getting a kernel update installed 
is many orders of magnitude harder out in the real world.

Vince

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-02 19:52       ` Vince Weaver
@ 2010-07-03 13:54         ` Ingo Molnar
  2010-07-04  0:30           ` David Dillow
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2010-07-03 13:54 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, LKML, Paul Mackerras, Arnaldo Carvalho de Melo

* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Fri, 2 Jul 2010, Peter Zijlstra wrote:
> 
> > On Fri, 2010-07-02 at 09:56 -0400, Vince Weaver wrote:
> > > You think I have root on this machine?
> > 
> > Well yeah,.. I'd not want a dev job and not have full access to the
> > hardware. But then, maybe I'm picky.
> 
> I can see how this support call would go now.
> 
>   Me:  Hello, I need you to upgrade the kernel on the
>        2.332 petaflop machine with 37,376 processors 
>        so I can have the right branch counter on perf.
>   Them: Umm... no.
>   Me:  Well then can I have root so I can patch
>        the kernel on the fly?
>   Them: <click>

No, the way it would go, for this particular bug you reported, is something 
like:

    Me:   Hello, I need you to upgrade the kernel on the
          2.332 petaflop machine with 37,376 processors 
          so I can have the right branch counter on perf.

    Them: Please wait for the next security/stability update of
          the 2.6.32 kernel.

    Me:   Thanks.

Because i marked this fix for a -stable backport so it will automatically 
propagate into all currently maintained stable kernels.

> As a performance counter library developer, it is a bit frustrating having 
> to keep a compatibility matrix in my head of all the perf events 
> shortcomings.  Especially since the users tend not to have admin access on 
> their machines.  Need to have at least 2.6.33 if you want multiplexing.  

Admins of restrictive environments are very reluctant to update _any_ system 
component, not just the kernel - and that includes instrumentation 
tools/libraries.

In fact often the kernel gets updated more frequently, because it's so 
central.

The solution for that is to not use restrictive environments with obsolete 
tools for bleeding-edge development - or to wait until the features you rely 
on trickle down to that environment as well.

Also, our design targets far more developers than just those who are willing 
to download the latest library and are willing to use LD_PRELOAD or other 
tricks. In reality most developers will wait for updates if there's a bug in 
the tool they are using.

You are a special case of a special case - _and_ you are limiting yourself by 
being willing to update everything _but_ the kernel.

Anyway, our design results out of our first-hand experience of laggy updates 
and limited capabilities of a user-centric performance-analysis library, and 
we wrote perf events to address those problems.

Claiming that we need a user-space-centric approach for the special case where 
you exclude the kernel from the components that may be updated in a system 
doesnt look like a strong reason to change the design.

> Need to have 2.6.34 if you want Nehalem-EX.  Need 2.6.35 if you want Pentium 
> 4. [...]

You wouldnt have gotten that any faster with a more user-space centric design 
either. Something like Pentium-4 support needs kernel help. So if you are 
stuck with an old kernel you wont have it - no matter what approach is used.

> [...] Now I'll have to remember whatever kenel the AMD branches fix is 
> committed at.  And there still isn't the Uncore support that everyone is 
> clamoring for.

You are very much welcome to help out with uncore events, if you are 
interested in them. The reason why they arent there yet is because so far 
people were more interested in adding support for say Pentium-4 events than in 
adding uncore events. If you want to change that then you either need to 
convince developers to implement it, or you need to do it yourself.

> > You can stick the knowledge in perf if you really want to.. something like 
> > the below, add something that parses cpuid or /proc/cpuinfo and you should 
> > be good.
> 
> again though, doesn't this defeat the purpose of the whole idea of common 
> named events?

You claimed there's no solution if there's a kernel update is not possible. 
Peter gave you such a solution and you are now claiming that it's no good 
because the better solution is to update the kernel? That argument seems 
either somewhat circular or somewhat contraditory.

> > And how would it be different if it the data table lived in userspace? 
> > They'd still get the wrong thing unless they updated.
> 
> because compiling and running an updated user-version of a library is 
> possible.  You can compile it in your home directory and link your tools 
> against it.  No need to bug the sysadmin.  You can even use LD_PRELOAD to 
> get other apps to link against it.  Getting a kernel update installed is 
> many orders of magnitude harder out in the real world.

Even in the restrictive-environment worst-case situation you mention (which, 
btw., is not that common at all), the kernel gets updated in a timely manner, 
with stability and security fixes. Your fix will be in .32-stable once it hits 
upstream.

And have you considered the counter argument: that with pure user-space 
libraries and tables there's a higher likelyhood that people will just sit on 
their fixes - because it's so easy to update the tables or add small hacks to 
fix the library?

With the perf events support code in the kernel they are encouraged to work 
with us and are encouraged to submit fixes - which will reach a far larger 
audience that way.

So our model will [obviously] lead to slower updates in those special 
situations where a super-high-end performance developer can do everything but 
upgrade the kernel, but otherwise common code is good for pretty much everyone 
else. We hurt more if it's buggy or incomplete, but in turn this creates 
pressure to keep that code correct.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-03 13:54         ` Ingo Molnar
@ 2010-07-04  0:30           ` David Dillow
  2010-07-04  9:11             ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: David Dillow @ 2010-07-04  0:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, Peter Zijlstra, LKML, Paul Mackerras,
	Arnaldo Carvalho de Melo

On Sat, 2010-07-03 at 15:54 +0200, Ingo Molnar wrote:
> * Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> 
> > On Fri, 2 Jul 2010, Peter Zijlstra wrote:
> > 
> > > On Fri, 2010-07-02 at 09:56 -0400, Vince Weaver wrote:
> > > > You think I have root on this machine?
> > > 
> > > Well yeah,.. I'd not want a dev job and not have full access to the
> > > hardware. But then, maybe I'm picky.
> > 
> > I can see how this support call would go now.
> > 
> >   Me:  Hello, I need you to upgrade the kernel on the
> >        2.332 petaflop machine with 37,376 processors 
> >        so I can have the right branch counter on perf.
> >   Them: Umm... no.
> >   Me:  Well then can I have root so I can patch
> >        the kernel on the fly?
> >   Them: <click>
> 
> No, the way it would go, for this particular bug you reported, is something 
> like:
> 
>     Me:   Hello, I need you to upgrade the kernel on the
>           2.332 petaflop machine with 37,376 processors 
>           so I can have the right branch counter on perf.
> 
>     Them: Please wait for the next security/stability update of
>           the 2.6.32 kernel.
> 
>     Me:   Thanks.

You're both funny, though Vince is closer to reality for the scale of
machines he's talking about. The vendor kernel on these behemoths is a
patched SLES11 kernel based on 2.6.18, and paint does indeed dry faster
than changes to that kernel occur.

It pains me that this is the case, but the vendor doesn't have the
resources to keep up-to-date, and even if they did, it's not clear that
the users would want them to do so -- you take risk with the changes,
and a small performance regression can end up costing them hundreds of
thousand CPU-hours, which is a problem when you have a budget in the low
millions -- all of which are needed to reach your science goals. Sure,
you may get some improvements, but there's risk.

> Because i marked this fix for a -stable backport so it will automatically 
> propagate into all currently maintained stable kernels.

That's wonderful, but doesn't address the situation Vince finds himself
in, and he's not alone. We just don't get kernel updates, as much as we
might like to. If the behavior is in user-space, then the library
developers can fix it quickly, and users can pull it into their
applications without waiting for a scheduled maintenance period. We try
not to take maintenance periods unless we need to clean up hardware
issues, as the primary function of the machine is CPU-hours for science
runs. It takes an hour or more to reboot the machine without needing to
perform any software updates, and that hour equals 224,000 CPU-hours
that could be better spent.

> > As a performance counter library developer, it is a bit frustrating having 
> > to keep a compatibility matrix in my head of all the perf events 
> > shortcomings.  Especially since the users tend not to have admin access on 
> > their machines.  Need to have at least 2.6.33 if you want multiplexing.  
> 
> Admins of restrictive environments are very reluctant to update _any_ system 
> component, not just the kernel - and that includes instrumentation 
> tools/libraries.
> 
> In fact often the kernel gets updated more frequently, because it's so 
> central.

Quite the reverse here, we update compilers and libraries quite often,
and we have a system in place that keeps the old versions in place.
There are often odd interdependencies between the libraries, and
particular science applications often require a specific version to run.
Upgrading libraries is fairly painless for us, and we can do it without
making the system unavailable to users.

> The solution for that is to not use restrictive environments with obsolete 
> tools for bleeding-edge development - or to wait until the features you rely 
> on trickle down to that environment as well.

Unfortunately, bleeding-edge high-performance computing requires running
in the vendor-supported environment, restrictive as it may be. There's
no where else that you can run an application that requires scaling up
to that many processors and memory footprint.

> Also, our design targets far more developers than just those who are willing 
> to download the latest library and are willing to use LD_PRELOAD or other 
> tricks. In reality most developers will wait for updates if there's a bug in 
> the tool they are using.
> 
> You are a special case of a special case - _and_ you are limiting yourself by 
> being willing to update everything _but_ the kernel.

We're limiting ourselves by expecting to get support from the vendor
after paying many millions for the machine, and the vendor just doesn't
move very quickly in kernel space. I could probably make HEAD run on the
machine with some hacking on the machine specific device drivers, but
it'd never see production use -- it would void support and that's a
deal-killer.

Note that I'm not arguing for a design change -- I'm just trying to give
you some background on why people in the high-performance computing
sector keep saying how much easier it is for them if they can fix issues
with a new library rather than a new kernel.

Once the (very) downstream vendors catch up to a baseline kernel with
perf in it, fixing bugs like this will require at least partial machine
downtimes or rolling upgrades with ksplice. Both of those mechanisms
have their own drawbacks and will require an increased candy supply to
keep the system admins from picking up pitchforks. :)
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf wrong branches event on AMD
  2010-07-04  0:30           ` David Dillow
@ 2010-07-04  9:11             ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2010-07-04  9:11 UTC (permalink / raw)
  To: David Dillow
  Cc: Vince Weaver, Peter Zijlstra, LKML, Paul Mackerras,
	Arnaldo Carvalho de Melo

* David Dillow <dillowda@ornl.gov> wrote:

> On Sat, 2010-07-03 at 15:54 +0200, Ingo Molnar wrote:
> > * Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> > 
> > > On Fri, 2 Jul 2010, Peter Zijlstra wrote:
> > > 
> > > > On Fri, 2010-07-02 at 09:56 -0400, Vince Weaver wrote:
> > > > > You think I have root on this machine?
> > > > 
> > > > Well yeah,.. I'd not want a dev job and not have full access to the
> > > > hardware. But then, maybe I'm picky.
> > > 
> > > I can see how this support call would go now.
> > > 
> > >   Me:  Hello, I need you to upgrade the kernel on the
> > >        2.332 petaflop machine with 37,376 processors 
> > >        so I can have the right branch counter on perf.
> > >   Them: Umm... no.
> > >   Me:  Well then can I have root so I can patch
> > >        the kernel on the fly?
> > >   Them: <click>
> > 
> > No, the way it would go, for this particular bug you reported, is something 
> > like:
> > 
> >     Me:   Hello, I need you to upgrade the kernel on the
> >           2.332 petaflop machine with 37,376 processors 
> >           so I can have the right branch counter on perf.
> > 
> >     Them: Please wait for the next security/stability update of
> >           the 2.6.32 kernel.
> > 
> >     Me:   Thanks.
> 
> You're both funny, though Vince is closer to reality for the scale of 
> machines he's talking about. The vendor kernel on these behemoths is a 
> patched SLES11 kernel based on 2.6.18, and paint does indeed dry faster than 
> changes to that kernel occur.

Well, i replied to the hypothetical posed in the mail, which presumed v2.6.32.

Note that a v2.6.18 box wont have perf events in any case [or any recent 
kernel feature] - they got introduced more than a year ago in v2.6.31.

Of course in real life there are even 2.6.9 based machines out there. There's 
some 2.4 leftovers as well. Life can be arbitrarily weird, for various good 
(and some not so good) reasons.

> > In fact often the kernel gets updated more frequently, because it's so 
> > central.
> 
> Quite the reverse here, we update compilers and libraries quite often, and 
> we have a system in place that keeps the old versions in place.

If you stipulate that you can upgrade anything but the component where a 
significant chunk of perf events logic lives (the kernel) then of course it 
will fail the comparison.

Our point is that it is not how most systems and most developers operate and 
that there are significant, well-proven advantages to the in-kernel model.

And the thing is, for 10 years performance monitoring under Linux was designed 
precisely in the way that was friendly to the 'impossible to upgrade the 
kernel' scenario you outlined - so it's not like we made a random design 
choice.

Still it got virtually nowhere in those 10 years and produced utterly 
incapable software - at least as far as kernel developers are concerned and we 
are trying a different angle now. If you want to help with the user-space 
centric design that Vince worked on then you can help him replace our design. 
I will certainly be glad to merge superior code.

Right now i dont see how that would be possible, having seen both approaches 
first-hand - but i'm ready to be surprised with code.

Or you can lobby your vendors to be more uptodate with the kernel. We upstream 
kernel developers are lobbying them too - it's a good thing to do in any case.

> There are often odd interdependencies between the libraries, and particular 
> science applications often require a specific version to run. Upgrading 
> libraries is fairly painless for us, and we can do it without making the 
> system unavailable to users.
> 
> > The solution for that is to not use restrictive environments with obsolete 
> > tools for bleeding-edge development - or to wait until the features you 
> > rely on trickle down to that environment as well.
> 
> Unfortunately, bleeding-edge high-performance computing requires running in 
> the vendor-supported environment, restrictive as it may be. There's no where 
> else that you can run an application that requires scaling up to that many 
> processors and memory footprint.

Well, 'restrictive, vendor-supplied software' is pretty much the opposite of 
what Linux is about, and for good reasons. It may work for you but i would not 
expect miracles - the two models dont mix very well.

By staying on v2.6.18 or older you will miss out on a lot of other nice kernel 
enhancements, not just perf events. Just in v2.6.32 we made a lot of other 
scalability enhancements to insanely-large hardware. If your vendor is still 
on v2.6.18 then you'll be hurting in a lot of places on sufficiently large 
hardware.

v2.6.18 is a nearly 4 years old kernel.

> > Also, our design targets far more developers than just those who are 
> > willing to download the latest library and are willing to use LD_PRELOAD 
> > or other tricks. In reality most developers will wait for updates if 
> > there's a bug in the tool they are using.
> > 
> > You are a special case of a special case - _and_ you are limiting yourself 
> > by being willing to update everything _but_ the kernel.
> 
> We're limiting ourselves by expecting to get support from the vendor after 
> paying many millions for the machine, and the vendor just doesn't move very 
> quickly in kernel space. I could probably make HEAD run on the machine with 
> some hacking on the machine specific device drivers, but it'd never see 
> production use -- it would void support and that's a deal-killer.

Looks like a possible market opening for a vendor with a better update 
frequency.

Also note the specific event table bug this thread is about: the fix is 
trivial, and any sane vendor, even if based on an old kernel, should be able 
to adopt it very quickly - just as quickly as they adapt security fixes.

And if you cannot wait for that, PeterZ posted a patch how you can redirect 
user-space towards a raw event of your choice.

> Note that I'm not arguing for a design change -- I'm just trying to give you 
> some background on why people in the high-performance computing sector keep 
> saying how much easier it is for them if they can fix issues with a new 
> library rather than a new kernel.
> 
> Once the (very) downstream vendors catch up to a baseline kernel with
> perf in it, fixing bugs like this will require at least partial machine
> downtimes or rolling upgrades with ksplice. Both of those mechanisms
> have their own drawbacks and will require an increased candy supply to
> keep the system admins from picking up pitchforks. :)

I can feel your pain of being unable to upgrade the kernel, but you really 
should shift that pain to your vendors and make them feel it too - not shift 
it towards kernel developers. We are doing our utmost best to give you the 
best technology on the planet, but one thing we cannot give you is a time 
machine that transports the features and fixes you care about back 4 years 
(and only those features) for free. (yet ;-)

Your 'cool feature' is another persons's 'stupid upstream flux that just 
destabilizes the kernel', and what is stupid upstream flux for you, may be a 
must-have cool feature for another person or organization. There's no way we 
can cut out just the feature stream that you care about and isolate you from 
the risks of all the other changes you dont care about.

And no, user-space libraries are not such a mechanism, for instrumentation 
technology. Most of the recent changes to perf events were in kernel logic 
that is not really sane to implement in user-space and which perfmon never 
even attempted to implement. So it's apples to oranges.

Also, i'd expect the core perf functionality to calm down, and i expect event 
table bugs to be fleshed out. We are keeping good backwards compatibility so 
even if you end up with an old kernel, you should have all the functionality 
that was implemented back then to work fine.

Also, it would be great if you could help out extend our self-test mechanisms 
to make sure stupid event table bugs do not slip through. We have 'perf test' 
[which is very small at the moment] which could be used to add more regression 
tests - the ones you care about.

So even with an in-kernel design there's various ways you could help us 
improve the situation and you could thus also influence features in a 
direction that is favorable to you.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perf/urgent] perf, x86: Fix incorrect branches event on AMD CPUs
  2010-07-01 19:30 [PATCH] perf wrong branches event on AMD Vince Weaver
  2010-07-01 19:54 ` Arnaldo Carvalho de Melo
  2010-07-02 11:38 ` Peter Zijlstra
@ 2010-07-03 13:58 ` tip-bot for Vince Weaver
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Vince Weaver @ 2010-07-03 13:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, paulus, hpa, mingo, a.p.zijlstra,
	robert.richter, fweisbec, stable, tglx, vweaver1, mingo,
	borislav.petkov

Commit-ID:  f287d332ce835f77a4f5077d2c0ef1e3f9ea42d2
Gitweb:     http://git.kernel.org/tip/f287d332ce835f77a4f5077d2c0ef1e3f9ea42d2
Author:     Vince Weaver <vweaver1@eecs.utk.edu>
AuthorDate: Thu, 1 Jul 2010 15:30:16 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sat, 3 Jul 2010 15:19:34 +0200

perf, x86: Fix incorrect branches event on AMD CPUs

While doing some performance counter validation tests on some
assembly language programs I noticed that the "branches:u"
count was very wrong on AMD machines.

It looks like the wrong event was selected.

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1007011526010.23160@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event_amd.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 611df11..c2897b7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -102,8 +102,8 @@ static const u64 amd_perfmon_event_map[] =
   [PERF_COUNT_HW_INSTRUCTIONS]		= 0x00c0,
   [PERF_COUNT_HW_CACHE_REFERENCES]	= 0x0080,
   [PERF_COUNT_HW_CACHE_MISSES]		= 0x0081,
-  [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= 0x00c4,
-  [PERF_COUNT_HW_BRANCH_MISSES]		= 0x00c5,
+  [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= 0x00c2,
+  [PERF_COUNT_HW_BRANCH_MISSES]		= 0x00c3,
 };
 
 static u64 amd_pmu_event_map(int hw_event)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-04  9:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-01 19:30 [PATCH] perf wrong branches event on AMD Vince Weaver
2010-07-01 19:54 ` Arnaldo Carvalho de Melo
2010-07-02 11:38 ` Peter Zijlstra
2010-07-02 13:56   ` Vince Weaver
2010-07-02 14:23     ` Peter Zijlstra
2010-07-02 19:52       ` Vince Weaver
2010-07-03 13:54         ` Ingo Molnar
2010-07-04  0:30           ` David Dillow
2010-07-04  9:11             ` Ingo Molnar
2010-07-03 13:58 ` [tip:perf/urgent] perf, x86: Fix incorrect branches event on AMD CPUs tip-bot for Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox