All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [patch 9/9] Scheduler profiling - Use conditional calls
Date: Fri, 1 Jun 2007 12:33:33 -0400	[thread overview]
Message-ID: <20070601163333.GA3242@Krystal> (raw)
In-Reply-To: <20070601091909.97570a16.akpm@linux-foundation.org>

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Fri, 1 Jun 2007 11:54:13 -0400 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > * Andrew Morton (akpm@linux-foundation.org) wrote:
> > > On Wed, 30 May 2007 10:00:34 -0400
> > > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > 
> > > > @@ -2990,7 +2991,8 @@
> > > >  			print_irqtrace_events(prev);
> > > >  		dump_stack();
> > > >  	}
> > > > -	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> > > > +	cond_call(profile_on,
> > > > +		profile_hit(SCHED_PROFILING, __builtin_return_address(0)));
> > > >  
> > > 
> > > That's looking pretty neat.  Do you have any before-and-after performance
> > > figures for i386 and for a non-optimised architecture?
> > 
> > Sure, here is the result of a small test comparing:
> > 1 - Branch depending on a cache miss (has to fetch in memory, caused by a 128
> >     bytes stride)). This is the test that is likely to look like what
> >     side-effect the original profile_hit code was causing, under the
> >     assumption that the kernel is already using L1 and L2 caches at
> >     their full capacity and that a supplementary data load would cause
> >     cache trashing.
> > 2 - Branch depending on L1 cache hit. Just for comparison.
> > 3 - Branch depending on a load immediate in the instruction stream.
> > 
> > It has been compiled with gcc -O2. Tests done on a 3GHz P4.
> > 
> > In the first test series, the branch is not taken:
> > 
> > number of tests : 1000
> > number of branches per test : 81920
> > memory hit cycles per iteration (mean) : 48.252
> > L1 cache hit cycles per iteration (mean) : 16.1693
> > instruction stream based test, cycles per iteration (mean) : 16.0432
> > 
> > 
> > In the second test series, the branch is taken and an integer is
> > incremented within the block:
> > 
> > number of tests : 1000
> > number of branches per test : 81920
> > memory hit cycles per iteration (mean) : 48.2691
> > L1 cache hit cycles per iteration (mean) : 16.396
> > instruction stream based test, cycles per iteration (mean) : 16.0441
> > 
> > Therefore, the memory fetch based test seems to be 200% slower than the
> > load immediate based test.
> 
> Confused.  From what did you calculate that 200%?
> 
> > (I am adding these results to the documentation)
> 
> Good, thanks.


(48.2691-16.0441)/16.0441 = 2.00

Which means that it is 200% slower to run this test while fetching the
branch condition from main memory rather than using the load immediate.

We could also put it like this : the speedup of the load immediate over
the memory fetch is 3.

48.2691/16.0441 = 3.00

Is there a preferred way to present these results in the documentation ?

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2007-06-01 16:33 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-30 14:00 [patch 0/9] Conditional Calls - for 2.6.22-rc2-mm1 Mathieu Desnoyers
2007-05-30 14:00 ` [patch 1/9] Conditional Calls - Architecture Independent Code Mathieu Desnoyers
2007-05-30 20:32   ` Andrew Morton
2007-05-31 16:34     ` Mathieu Desnoyers
2007-05-31 13:47   ` Andi Kleen
2007-06-05 18:40     ` Mathieu Desnoyers
2007-06-04 19:01   ` Adrian Bunk
2007-06-13 15:57     ` Mathieu Desnoyers
2007-06-13 21:51       ` Adrian Bunk
2007-06-14 16:02         ` Mathieu Desnoyers
2007-06-14 21:06           ` Adrian Bunk
2007-06-20 21:59             ` Mathieu Desnoyers
2007-06-21 13:00               ` Adrian Bunk
2007-06-21 13:55                 ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 2/9] Conditional Calls - Hash Table Mathieu Desnoyers
2007-05-30 20:32   ` Andrew Morton
2007-05-31 13:42   ` Andi Kleen
2007-06-01 16:08     ` Matt Mackall
2007-06-01 16:46       ` Mathieu Desnoyers
2007-06-01 17:07         ` Matt Mackall
2007-06-01 17:45           ` Andi Kleen
2007-06-01 18:06             ` Mathieu Desnoyers
2007-06-01 18:49               ` Matt Mackall
2007-06-01 19:35               ` Andi Kleen
2007-06-01 20:33                 ` Mathieu Desnoyers
2007-06-01 20:44                   ` Andi Kleen
2007-06-04 22:26                     ` Mathieu Desnoyers
2007-06-01 18:03           ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 3/9] Conditional Calls - Non Optimized Architectures Mathieu Desnoyers
2007-05-30 14:00 ` [patch 4/9] Conditional Calls - Add kconfig menus Mathieu Desnoyers
2007-05-30 14:00 ` [patch 5/9] Conditional Calls - i386 Optimization Mathieu Desnoyers
2007-05-30 20:33   ` Andrew Morton
2007-05-31 13:54   ` Andi Kleen
2007-06-05 19:02     ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 6/9] Conditional Calls - PowerPC Optimization Mathieu Desnoyers
2007-05-30 14:00 ` [patch 7/9] Conditional Calls - Documentation Mathieu Desnoyers
2007-05-30 14:00 ` [patch 8/9] F00F bug fixup for i386 - use conditional calls Mathieu Desnoyers
2007-05-30 20:33   ` Andrew Morton
2007-05-31 21:07     ` Mathieu Desnoyers
2007-05-31 21:21       ` Andrew Morton
2007-05-31 21:38         ` Mathieu Desnoyers
2007-05-30 14:00 ` [patch 9/9] Scheduler profiling - Use " Mathieu Desnoyers
2007-05-30 20:34   ` Andrew Morton
2007-06-01 15:54     ` Mathieu Desnoyers
2007-06-01 16:19       ` Andrew Morton
2007-06-01 16:33         ` Mathieu Desnoyers [this message]
2007-05-30 20:59   ` William Lee Irwin III
2007-05-31 21:12     ` Mathieu Desnoyers
2007-05-31 23:41       ` William Lee Irwin III
2007-06-04 22:20         ` Mathieu Desnoyers
2007-05-30 21:44   ` Matt Mackall
2007-05-31 21:36     ` Mathieu Desnoyers
2007-05-31 13:39   ` Andi Kleen
2007-05-31 22:07     ` Mathieu Desnoyers
2007-05-31 22:33       ` Andi Kleen
2007-06-04 22:20         ` Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2007-05-29 18:33 [patch 0/9] Conditional Calls Mathieu Desnoyers
2007-05-29 18:34 ` [patch 9/9] Scheduler profiling - Use conditional calls Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070601163333.GA3242@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.