public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* perf_event self-monitoring overhead regression
@ 2011-11-30  5:20 Vince Weaver
  2011-11-30 11:54 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2011-11-30  5:20 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel@vger.kernel.org, a.p.zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian

Hello

I've been tracking a performance regression with self-monitoring and 
perf_event.

For a simple start/stop/read test, the overhead has increased about 10%
from the 2.6.32 kernel to 3.1.  (this has been measured on a variety
of x86_64 machines).

This is as measured with the POSIX clock_gettime(CLOCK_REALTIME,&time)
calls.  Potentially the issue is with this and not with perf_events.
As you can imagine it is hard to measure the performance of the perf_event
interface since you can't invoke perf_event on it.

In any case, I was trying to bisect some of these performance issues.  
There was another jump in overhead between 3.0 and 3.1, so I tried there.
I had a bisectable test case, but after a tedious day-long bisect run the 
problem bisected down to

   commit 2d86a3f04e345b03d5e429bfe14985ce26bff4dc
   Merge: 3960ef3 5ddac6b
   Author: Linus Torvalds <torvalds@linux-foundation.org>
   Date:   Tue Jul 26 17:13:04 2011 -0700

    Merge branch 'next/board' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/l
    
Which seems unlikely.  My git skills really aren't enough to try to figure
out why an ARM board merge would affect the overhead of the perf_event
syscalls on x86_64.

Is there a better way for trying to track down performance regressions 
like this?

Thanks,

Vince
vweaver1@eecs.utk.edu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-11-30  5:20 perf_event self-monitoring overhead regression Vince Weaver
@ 2011-11-30 11:54 ` Peter Zijlstra
  2011-11-30 17:44   ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2011-11-30 11:54 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo, Stephane Eranian, Linus Torvalds

On Wed, 2011-11-30 at 00:20 -0500, Vince Weaver wrote:
> Hello
> 
> I've been tracking a performance regression with self-monitoring and 
> perf_event.
> 
> For a simple start/stop/read test, the overhead has increased about 10%
> from the 2.6.32 kernel to 3.1.  (this has been measured on a variety
> of x86_64 machines).
> 
> This is as measured with the POSIX clock_gettime(CLOCK_REALTIME,&time)
> calls.  Potentially the issue is with this and not with perf_events.
> As you can imagine it is hard to measure the performance of the perf_event
> interface since you can't invoke perf_event on it.

If you've got a stable TSC on your machine you can of course revert to
userspace TSC reads and eliminate clock_gettime() from the picture.

> In any case, I was trying to bisect some of these performance issues.  
> There was another jump in overhead between 3.0 and 3.1, so I tried there.
> I had a bisectable test case, but after a tedious day-long bisect run the 
> problem bisected down to
> 
>    commit 2d86a3f04e345b03d5e429bfe14985ce26bff4dc
>    Merge: 3960ef3 5ddac6b
>    Author: Linus Torvalds <torvalds@linux-foundation.org>
>    Date:   Tue Jul 26 17:13:04 2011 -0700
> 
>     Merge branch 'next/board' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/l
>     
> Which seems unlikely.  My git skills really aren't enough to try to figure
> out why an ARM board merge would affect the overhead of the perf_event
> syscalls on x86_64.
> 
> Is there a better way for trying to track down performance regressions 
> like this?

I CC'ed Linus since he's way too skilled at this git thing and always
has good advice. There might be very good bisect advice in the lkml
archives but I'm not sure there's anything like a HOWTO/FAQ on the
subject other than the git-bisect manpage (ought there be one?).

One thing that was suggested at the last KS is a git bisect mode that
jumps on merge commits instead of random points in the history. And only
once its isolated a particular merge will it bisect that one merge on
commit level.

Something like that might help by reducing noise, or it might not.
Bisect is often (as you well know by now) a lot harder in practice than
it sounds in theory :/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-11-30 11:54 ` Peter Zijlstra
@ 2011-11-30 17:44   ` Linus Torvalds
  2011-12-01 13:59     ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2011-11-30 17:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, Ingo Molnar, linux-kernel@vger.kernel.org,
	Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian

On Wed, Nov 30, 2011 at 3:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> I CC'ed Linus since he's way too skilled at this git thing and always
> has good advice. There might be very good bisect advice in the lkml
> archives but I'm not sure there's anything like a HOWTO/FAQ on the
> subject other than the git-bisect manpage (ought there be one?).

Basically, "git bisect" will never really work unless you have a very
un-ambiguous case: the way bisection works (and the reason it is so
efficient) really requires that the good/bad decision be entirely
black-and-white with absolutely no room for error.

I suspect that in some performance-related thing like this, that kind
of black-and-white situation simply doesn't end up existing. For all
we know, it could be at least partly about cache layout etc, and
trying to bisect it is likely futile, unless you have a *really* good
testcase.

So bisect works really well for clear bugs that are 100% repeatable
and have a very clear "did it happen or not" signature, but I'd be
very leery indeed of using it with some performance anomaly.

That said, maybe you can get the timings to be unambiguous enough by
using 'rdtsc' in user space, and try the bisect again.

                        Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-11-30 17:44   ` Linus Torvalds
@ 2011-12-01 13:59     ` Vince Weaver
  2011-12-01 18:11       ` stephane eranian
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2011-12-01 13:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel@vger.kernel.org,
	Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian

On Wed, 30 Nov 2011, Linus Torvalds wrote:

> On Wed, Nov 30, 2011 at 3:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> So bisect works really well for clear bugs that are 100% repeatable
> and have a very clear "did it happen or not" signature, but I'd be
> very leery indeed of using it with some performance anomaly.

In this case, the bisection case is pretty clean.  I run 1000 of the tests 
and check the median value, and the values are always 7280us for good and 
7440us for bad.

The problem is when it starts bisecting into the merges it drops from a 
post-3.0 kernel into a much earlier 3.0-rc kernel (due to the ARM merge 
history) and suddenly then the test becomes meaningless as it starts 
returning other values such as 6880us.  This is because the problem I am 
tracking has ben gradually getting worse over time, and so I guess by 
going back to a 3.0-rc it gets earlier than some other change that made 
performance worse, and thus it becomes impossible to know if a commit is 
good or not.

> That said, maybe you can get the timings to be unambiguous enough by
> using 'rdtsc' in user space, and try the bisect again.

I'll try but I'm guessing your first thought that this might be 
un-bisectabe is probably true.  

Thanks,

Vince


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-12-01 13:59     ` Vince Weaver
@ 2011-12-01 18:11       ` stephane eranian
  2011-12-06 21:07         ` Vince Weaver
  0 siblings, 1 reply; 9+ messages in thread
From: stephane eranian @ 2011-12-01 18:11 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo

Vince,

I take it your test is all about self-monitoring, single event, single thread.
Did you try breaking down the cost using  TSC and rdtsc() to pinpoint where the
regression is coming from in the 3 perf_event syscalls you're using?


On Thu, Dec 1, 2011 at 5:59 AM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> On Wed, 30 Nov 2011, Linus Torvalds wrote:
>
>> On Wed, Nov 30, 2011 at 3:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>>
>> So bisect works really well for clear bugs that are 100% repeatable
>> and have a very clear "did it happen or not" signature, but I'd be
>> very leery indeed of using it with some performance anomaly.
>
> In this case, the bisection case is pretty clean.  I run 1000 of the tests
> and check the median value, and the values are always 7280us for good and
> 7440us for bad.
>
> The problem is when it starts bisecting into the merges it drops from a
> post-3.0 kernel into a much earlier 3.0-rc kernel (due to the ARM merge
> history) and suddenly then the test becomes meaningless as it starts
> returning other values such as 6880us.  This is because the problem I am
> tracking has ben gradually getting worse over time, and so I guess by
> going back to a 3.0-rc it gets earlier than some other change that made
> performance worse, and thus it becomes impossible to know if a commit is
> good or not.
>
>> That said, maybe you can get the timings to be unambiguous enough by
>> using 'rdtsc' in user space, and try the bisect again.
>
> I'll try but I'm guessing your first thought that this might be
> un-bisectabe is probably true.
>
> Thanks,
>
> Vince
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-12-01 18:11       ` stephane eranian
@ 2011-12-06 21:07         ` Vince Weaver
  2011-12-08  0:44           ` stephane eranian
  0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2011-12-06 21:07 UTC (permalink / raw)
  To: eranian
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo

On Thu, 1 Dec 2011, stephane eranian wrote:

> I take it your test is all about self-monitoring, single event, single thread.
> Did you try breaking down the cost using  TSC and rdtsc() to pinpoint where the
> regression is coming from in the 3 perf_event syscalls you're using?

I've started gathering results using rdtsc() and the results are even more 
puzzling.  

See:
   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/core2_raw_null_kernel_rdtsc.png
for example.

Those results are from a core2 machine, with the CPU scaling governor set 
to "performance", the test bound to CPU0, and the test run 1000 times.

Part of the issue is that a few of those kernels are Debian unstable 
distro kernels and not hand-compiled stock kernels.  I'll rebuild a full 
set of kernels myself and see if I can reproduce the results.

Vince


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-12-06 21:07         ` Vince Weaver
@ 2011-12-08  0:44           ` stephane eranian
  2011-12-08 18:34             ` Vince Weaver
  2011-12-09 22:22             ` Vince Weaver
  0 siblings, 2 replies; 9+ messages in thread
From: stephane eranian @ 2011-12-08  0:44 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo

On Tue, Dec 6, 2011 at 1:07 PM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> On Thu, 1 Dec 2011, stephane eranian wrote:
>
>> I take it your test is all about self-monitoring, single event, single thread.
>> Did you try breaking down the cost using  TSC and rdtsc() to pinpoint where the
>> regression is coming from in the 3 perf_event syscalls you're using?
>
> I've started gathering results using rdtsc() and the results are even more
> puzzling.
>
Are those the results for surrounding the ioctl() with rdtsc()?

What do the axis actually represent?

> See:
>   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/core2_raw_null_kernel_rdtsc.png
> for example.
>
> Those results are from a core2 machine, with the CPU scaling governor set
> to "performance", the test bound to CPU0, and the test run 1000 times.
>
> Part of the issue is that a few of those kernels are Debian unstable
> distro kernels and not hand-compiled stock kernels.  I'll rebuild a full
> set of kernels myself and see if I can reproduce the results.
>
> Vince
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-12-08  0:44           ` stephane eranian
@ 2011-12-08 18:34             ` Vince Weaver
  2011-12-09 22:22             ` Vince Weaver
  1 sibling, 0 replies; 9+ messages in thread
From: Vince Weaver @ 2011-12-08 18:34 UTC (permalink / raw)
  To: eranian
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo

On Wed, 7 Dec 2011, stephane eranian wrote:

> Are those the results for surrounding the ioctl() with rdtsc()?
> 
> What do the axis actually represent?
> 

The code looks like this:

   start_before=rdtsc();
   ret1=ioctl(fd[0], PERF_EVENT_IOC_ENABLE,0);
   start_after=rdtsc();

Similar for stop and read.  The events are set up in advance.

I've put up a page with some more graphs (though they're a bit rough yet)
and more background on what the tests are doing here:
   
 http://web.eecs.utk.edu/~vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/

The overall overhead results I'm seeing with rdtsc() are the same I saw 
using the clock_gettime() calls.

The total overhead of start/stop/read seems to be getting worse with 
each kernel release, but the individual start, stop and read calls don't
follow an obvious pattern.

I'm rebuilding stock kernels with the same compiler/config and I'll re-run
the tests but it might be a few days before I'll have those results.

Thanks,

Vince
vweaver1@eecs.utk.edu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf_event self-monitoring overhead regression
  2011-12-08  0:44           ` stephane eranian
  2011-12-08 18:34             ` Vince Weaver
@ 2011-12-09 22:22             ` Vince Weaver
  1 sibling, 0 replies; 9+ messages in thread
From: Vince Weaver @ 2011-12-09 22:22 UTC (permalink / raw)
  To: eranian
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	linux-kernel@vger.kernel.org, Paul Mackerras,
	Arnaldo Carvalho de Melo


Here are the complete results, using official kernels.

   http://web.eecs.utk.edu/~vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/

The overall overhead start/stop/read has been getting worse.  3.1.0 was 
bad and the current 3.2.0-rc4-tip tree is even worse.

I'm not sure if it's worth trying to track down the changes that cause 
these regressions, as they might still be considered just noise, though 
the best to worst performance degredation is ~ 15% when attempting to 
self-monitor with 3 events on a core2 machine.

Vince


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-12-09 22:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-30  5:20 perf_event self-monitoring overhead regression Vince Weaver
2011-11-30 11:54 ` Peter Zijlstra
2011-11-30 17:44   ` Linus Torvalds
2011-12-01 13:59     ` Vince Weaver
2011-12-01 18:11       ` stephane eranian
2011-12-06 21:07         ` Vince Weaver
2011-12-08  0:44           ` stephane eranian
2011-12-08 18:34             ` Vince Weaver
2011-12-09 22:22             ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox