CFS Performance Issues

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* CFS Performance Issues
@ 2009-05-28 13:02 Olaf Kirch
  2009-05-28 15:49 ` David Newall
  2009-05-28 20:31 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Olaf Kirch @ 2009-05-28 13:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, Andreas Gruenbacher

[-- Attachment #1: Type: text/plain, Size: 3671 bytes --]

Hi Ingo,

As you probably know, we've been chasing a variety of performance issues
on our SLE11 kernel, and one of the suspects has been CFS for quite a
while. The benchmarks that pointed to CFS include AIM7, dbench, and a few
others, but the picture has been a bit hazy as to what is really the problem here.

Now IBM recently told us they had played around with some scheduler
tunables and found that by turning off NEW_FAIR_SCHEDULERS, they
could make the regression on a compute benchmark go away completely.
We're currently working on rerunning other benchmarks with NEW_FAIR_SLEEPERS
turned off to see whether it has an impact on these as well.

Of course, the first question we asked ourselves was, how can NEW_FAIR_SLEEPERS
affect a benchmark that rarely sleeps, or not at all?

The answer was, it's not affecting the benchmark processes, but some noise
going on in the background. When I was first able to reproduce this on my work
station, it was knotify4 running in the background - using hardly any CPU, but
getting woken up ~1000 times a second. Don't ask me what it's doing :-)

So I sat down and reproduced this; the most recent iteration of the test program
is courtesy of Andreas Gruenbacher (see below).

This program spawns a number of processes that just spin in a loop. It also spawns
a single process that wakes up 1000 times a second. Every second, it computes the
average time slice per process (utime / number of involuntary context switches),
and prints out the overall average time slice and average utime.

While running this program, you can conveniently enable or disable fair sleepers.
When I do this on my test machine (no desktop in the background this time :-)
I see this:

./slice 16
    avg slice:  1.12 utime: 216263.187500
    avg slice:  0.25 utime: 125507.687500
    avg slice:  0.31 utime: 125257.937500
    avg slice:  0.31 utime: 125507.812500
    avg slice:  0.12 utime: 124507.875000
    avg slice:  0.38 utime: 124757.687500
    avg slice:  0.31 utime: 125508.000000
    avg slice:  0.44 utime: 125757.750000
    avg slice:  2.00 utime: 128258.000000
 ------ here I turned off new_fair_sleepers ----
    avg slice: 10.25 utime: 137008.500000
    avg slice:  9.31 utime: 139008.875000
    avg slice: 10.50 utime: 141508.687500
    avg slice:  9.44 utime: 139258.750000
    avg slice: 10.31 utime: 140008.687500
    avg slice:  9.19 utime: 139008.625000
    avg slice: 10.00 utime: 137258.625000
    avg slice: 10.06 utime: 135258.562500
    avg slice:  9.62 utime: 138758.562500

As you can see, the average time slice is *extremely* low with new fair
sleepers enabled. Turning it off, we get ~10ms time slices, and a
performance that is roughly 10% higher. It looks like this kind of
"silly time slice syndrome" is what is really eating performance here.

After staring at place_entity for a while, and by watching the process'
vruntime for a while, I think what's happening is this.

With fair sleepers turned off, a process that just got woken up will
get the vruntime of the process that's leftmost in the rbtree, and will
thus be placed to the right of the current task.

However, with fair_sleepers enabled, a newly woken up process
will retain its old vruntime as long as it's less than sched_latency
in the past, and thus it will be placed to the very left in the rbtree.
Since a task that is mostly sleeping will never accrue vruntime at
the same rate a cpu-bound task does, it will always preempt any
running task almost immediately after it's scheduled.

Does this make sense?

Any insight you can offer here is greatly appreciated!

Thanks,
Olaf
-- 
Neo didn't bring down the Matrix. SOA did.
				--soafacts.com

[-- Attachment #2: slice.c --]
[-- Type: text/x-csrc, Size: 2106 bytes --]

/*
 * from agruen - 2009 05 28
 *
 * Test time slices given to each process
 */
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/stat.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#undef WITH_PER_PROCESS_SLICES

struct msg {
	long mtype;
	long nivcsw;
	long utime;
};

int msqid;

void report_to_parent(int dummy) {
	static long old_nivcsw, old_utime;
	long utime;
	struct rusage usage;
	struct msg msg;

	getrusage(RUSAGE_SELF, &usage);

	utime = usage.ru_utime.tv_sec * 1000000 + usage.ru_utime.tv_usec;

	msg.mtype = 1;
	msg.nivcsw = usage.ru_nivcsw - old_nivcsw;
	msg.utime = utime - old_utime;
	msgsnd(msqid, &msg, sizeof(msg) - sizeof(long), 0);

	old_nivcsw = usage.ru_nivcsw;
	old_utime = utime;
}

void worker(void) {
	struct sigaction sa;

	sa.sa_handler = report_to_parent;
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = 0;
	sigaction(SIGALRM, &sa, NULL);

	while (1)
		/* do nothing */ ;
}

void sleeper(void) {

	while (1) {
		usleep(1000);
	}
}

int main(int argc, char *argv[])
{
	int n, nproc;
	pid_t *pid;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s <number-of-processes>\n", argv[0]);
		return 1;
	}

	msqid = msgget(IPC_PRIVATE, S_IRUSR | S_IWUSR);

	nproc = atoi(argv[1]);
	pid = malloc(nproc * sizeof(pid_t));
	for (n = 0; n < nproc; n++) {
		pid[n] = fork();
		if (pid[n] == 0)
			worker();
	}

	/* Fork sleeper(s) */
	for (n = 0; n < (nproc + 7)/8; n++)
		if (fork() == 0)
			sleeper();
	for(;;) {
		long avg_slice = 0, avg_utime = 0;

		sleep(1);
		for (n = 0; n < nproc; n++)
			kill(pid[n], SIGALRM);
		for (n = 0; n < nproc; n++) {
			struct msg msg;
			double slice;

			msgrcv(msqid, &msg, sizeof(msg) - sizeof(long), 0, 0);
			slice = 0.001 * msg.utime /
				        (msg.nivcsw ? msg.nivcsw : 1);
#ifdef WITH_PER_PROCESS_SLICES
			printf("%6.1f ", slice);
#endif

			avg_slice += slice;
			avg_utime += msg.utime;
		}

		printf("    avg slice: %5.2f utime: %f",
				(double) avg_slice / nproc,
				(double) avg_utime / nproc);
		printf("\n");
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CFS Performance Issues
  2009-05-28 13:02 CFS Performance Issues Olaf Kirch
@ 2009-05-28 15:49 ` David Newall
  2009-05-28 18:20   ` Olaf Kirch
  2009-05-28 20:31 ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: David Newall @ 2009-05-28 15:49 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, mingo, Andreas Gruenbacher

Olaf Kirch wrote:
> As you probably know, we've been chasing a variety of performance issues
> ...
> I see this:
>
> ./slice 16
>     avg slice:  1.12 utime: 216263.187500
> ...
> Any insight you can offer here is greatly appreciated!
>   

About that: avg slice is in nsec, not msec (the old, off-by-one-million
bug), and utime, also an average, is in usec.

The first result indicates 1.12 nsec per context switch,  193 context
switches and 346% CPU utilisation.  You must have at least four CPU
cores.  Here's your table, extended* per this interpretation:

./slice 16
    avg slice:  1.12 utime: 216263.187500:  1.12 nsec/csw,  193 csw, 346 CPU%
    avg slice:  0.25 utime: 125507.687500:  0.25 nsec/csw,  502 csw, 200 CPU%
    avg slice:  0.31 utime: 125257.937500:  0.31 nsec/csw,  404 csw, 200 CPU%
    avg slice:  0.31 utime: 125507.812500:  0.31 nsec/csw,  404 csw, 200 CPU%
    avg slice:  0.12 utime: 124507.875000:  0.12 nsec/csw, 1037 csw, 199 CPU%
    avg slice:  0.38 utime: 124757.687500:  0.38 nsec/csw,  328 csw, 199 CPU%
    avg slice:  0.31 utime: 125508.000000:  0.31 nsec/csw,  404 csw, 200 CPU%
    avg slice:  0.44 utime: 125757.750000:  0.44 nsec/csw,  285 csw, 201 CPU%
    avg slice:  2.00 utime: 128258.000000:  2.00 nsec/csw,   64 csw, 205 CPU%
 ------ here I turned off new_fair_sleepers ----
    avg slice: 10.25 utime: 137008.500000: 10.25 nsec/csw,   13 csw, 219 CPU%
    avg slice:  9.31 utime: 139008.875000:  9.31 nsec/csw,   14 csw, 222 CPU%
    avg slice: 10.50 utime: 141508.687500: 10.50 nsec/csw,   13 csw, 226 CPU%
    avg slice:  9.44 utime: 139258.750000:  9.44 nsec/csw,   14 csw, 222 CPU%
    avg slice: 10.31 utime: 140008.687500: 10.31 nsec/csw,   13 csw, 224 CPU%
    avg slice:  9.19 utime: 139008.625000:  9.19 nsec/csw,   15 csw, 222 CPU%
    avg slice: 10.00 utime: 137258.625000: 10.00 nsec/csw,   13 csw, 219 CPU%
    avg slice: 10.06 utime: 135258.562500: 10.06 nsec/csw,   13 csw, 216 CPU%
    avg slice:  9.62 utime: 138758.562500:  9.62 nsec/csw,   14 csw, 222 CPU%


You don't seem to be getting good CPU utilisation.

*awk '{printf "%s: %5.2f nsec/csw, %4d csw, %3d CPU%%\n", $0, $3, $5/$3/1000, $5*16/10000}'


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CFS Performance Issues
  2009-05-28 15:49 ` David Newall
@ 2009-05-28 18:20   ` Olaf Kirch
  2009-05-28 18:43     ` David Newall
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Kirch @ 2009-05-28 18:20 UTC (permalink / raw)
  To: David Newall; +Cc: linux-kernel, mingo, Andreas Gruenbacher

On Thursday 28 May 2009 17:49:06 David Newall wrote:
> > I see this:
> >
> > ./slice 16
> >     avg slice:  1.12 utime: 216263.187500
> > ...
> > Any insight you can offer here is greatly appreciated!
> >   
> 
> About that: avg slice is in nsec, not msec (the old, off-by-one-million
> bug), and utime, also an average, is in usec.

I'm not sure I'm following what you're saying.

The test program uses getrusage(SELF) to get the utime, which is in
microseconds.

For the average time slice, it takes ru_ninvcsw, ie the number of involuntary
context switches as reported by getrusage, and computes

	utime / ninvcsw / 1000

which is a millisecond quantity.

> The first result indicates 1.12 nsec per context switch,  193 context
> switches and 346% CPU utilisation.  You must have at least four CPU

Consider the first line a bogon - it's crap resulting from variables not being
initialized with proper values the first time around. This machine is actually a
core duo, and as you notice, most of the time the numbers add up to
roughly 200% CPU utilization.

But you're right that it sometimes reports more than 200% is actually
a bug in the test case - it's not guaranteed that we're taking the samples
always exactly 1 second apart, so the utime needs to be scaled by the
elapsed wall time.

I'll fix that and come back

Olaf
-- 
Neo didn't bring down the Matrix. SOA did.
				--soafacts.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CFS Performance Issues
  2009-05-28 18:20   ` Olaf Kirch
@ 2009-05-28 18:43     ` David Newall
  0 siblings, 0 replies; 6+ messages in thread
From: David Newall @ 2009-05-28 18:43 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, mingo, Andreas Gruenbacher

Olaf Kirch wrote:
>> About that: avg slice is in nsec, not msec (the old, off-by-one-million
>> bug), and utime, also an average, is in usec.
>>     
> ...
> 	utime / ninvcsw / 1000
>
> which is a millisecond quantity.
>   

Sorry.  My confusion.  (The old, divide-instead-of-multiply bug.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CFS Performance Issues
  2009-05-28 13:02 CFS Performance Issues Olaf Kirch
  2009-05-28 15:49 ` David Newall
@ 2009-05-28 20:31 ` Peter Zijlstra
  2009-05-30 11:18   ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2009-05-28 20:31 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, mingo, Andreas Gruenbacher, Mike Galbraith

On Thu, 2009-05-28 at 15:02 +0200, Olaf Kirch wrote:
> Hi Ingo,
> 
> As you probably know, we've been chasing a variety of performance issues
> on our SLE11 kernel, and one of the suspects has been CFS for quite a
> while. The benchmarks that pointed to CFS include AIM7, dbench, and a few
> others, but the picture has been a bit hazy as to what is really the problem here.
> 
> Now IBM recently told us they had played around with some scheduler
> tunables and found that by turning off NEW_FAIR_SCHEDULERS, they
> could make the regression on a compute benchmark go away completely.
> We're currently working on rerunning other benchmarks with NEW_FAIR_SLEEPERS
> turned off to see whether it has an impact on these as well.
> 
> Of course, the first question we asked ourselves was, how can NEW_FAIR_SLEEPERS
> affect a benchmark that rarely sleeps, or not at all?
> 
> The answer was, it's not affecting the benchmark processes, but some noise
> going on in the background. When I was first able to reproduce this on my work
> station, it was knotify4 running in the background - using hardly any CPU, but
> getting woken up ~1000 times a second. Don't ask me what it's doing :-)
> 
> So I sat down and reproduced this; the most recent iteration of the test program
> is courtesy of Andreas Gruenbacher (see below).
> 
> This program spawns a number of processes that just spin in a loop. It also spawns
> a single process that wakes up 1000 times a second. Every second, it computes the
> average time slice per process (utime / number of involuntary context switches),
> and prints out the overall average time slice and average utime.
> 
> While running this program, you can conveniently enable or disable fair sleepers.
> When I do this on my test machine (no desktop in the background this time :-)
> I see this:
> 
> ../slice 16
>     avg slice:  1.12 utime: 216263.187500
>     avg slice:  0.25 utime: 125507.687500
>     avg slice:  0.31 utime: 125257.937500
>     avg slice:  0.31 utime: 125507.812500
>     avg slice:  0.12 utime: 124507.875000
>     avg slice:  0.38 utime: 124757.687500
>     avg slice:  0.31 utime: 125508.000000
>     avg slice:  0.44 utime: 125757.750000
>     avg slice:  2.00 utime: 128258.000000
>  ------ here I turned off new_fair_sleepers ----
>     avg slice: 10.25 utime: 137008.500000
>     avg slice:  9.31 utime: 139008.875000
>     avg slice: 10.50 utime: 141508.687500
>     avg slice:  9.44 utime: 139258.750000
>     avg slice: 10.31 utime: 140008.687500
>     avg slice:  9.19 utime: 139008.625000
>     avg slice: 10.00 utime: 137258.625000
>     avg slice: 10.06 utime: 135258.562500
>     avg slice:  9.62 utime: 138758.562500
> 
> As you can see, the average time slice is *extremely* low with new fair
> sleepers enabled. Turning it off, we get ~10ms time slices, and a
> performance that is roughly 10% higher. It looks like this kind of
> "silly time slice syndrome" is what is really eating performance here.
> 
> After staring at place_entity for a while, and by watching the process'
> vruntime for a while, I think what's happening is this.
> 
> With fair sleepers turned off, a process that just got woken up will
> get the vruntime of the process that's leftmost in the rbtree, and will
> thus be placed to the right of the current task.
> 
> However, with fair_sleepers enabled, a newly woken up process
> will retain its old vruntime as long as it's less than sched_latency
> in the past, and thus it will be placed to the very left in the rbtree.
> Since a task that is mostly sleeping will never accrue vruntime at
> the same rate a cpu-bound task does, it will always preempt any
> running task almost immediately after it's scheduled.
> 
> Does this make sense?

Yep, you got it right.

> Any insight you can offer here is greatly appreciated!

There's a class of applications and benchmarks that rather likes this
behaviour, particularly those that favour timely delivery of signals and
other wakeup driven thingies.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CFS Performance Issues
  2009-05-28 20:31 ` Peter Zijlstra
@ 2009-05-30 11:18   ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-05-30 11:18 UTC (permalink / raw)
  To: Olaf Kirch, Peter Zijlstra, Mike Galbraith
  Cc: linux-kernel, mingo, Andreas Gruenbacher, Mike Galbraith


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2009-05-28 at 15:02 +0200, Olaf Kirch wrote:
> > Hi Ingo,
> > 
> > As you probably know, we've been chasing a variety of 
> > performance issues on our SLE11 kernel, and one of the suspects 
> > has been CFS for quite a while. The benchmarks that pointed to 
> > CFS include AIM7, dbench, and a few others, but the picture has 
> > been a bit hazy as to what is really the problem here.
> > 
> > Now IBM recently told us they had played around with some 
> > scheduler tunables and found that by turning off 
> > NEW_FAIR_SCHEDULERS, they could make the regression on a compute 
> > benchmark go away completely. We're currently working on 
> > rerunning other benchmarks with NEW_FAIR_SLEEPERS turned off to 
> > see whether it has an impact on these as well.
> > 
> > Of course, the first question we asked ourselves was, how can 
> > NEW_FAIR_SLEEPERS affect a benchmark that rarely sleeps, or not 
> > at all?
> > 
> > The answer was, it's not affecting the benchmark processes, but 
> > some noise going on in the background. When I was first able to 
> > reproduce this on my work station, it was knotify4 running in 
> > the background - using hardly any CPU, but getting woken up 
> > ~1000 times a second. Don't ask me what it's doing :-)
> > 
> > So I sat down and reproduced this; the most recent iteration of 
> > the test program is courtesy of Andreas Gruenbacher (see below).
> > 
> > This program spawns a number of processes that just spin in a 
> > loop. It also spawns a single process that wakes up 1000 times a 
> > second. Every second, it computes the average time slice per 
> > process (utime / number of involuntary context switches), and 
> > prints out the overall average time slice and average utime.
> > 
> > While running this program, you can conveniently enable or 
> > disable fair sleepers. When I do this on my test machine (no 
> > desktop in the background this time :-) I see this:
> > 
> > ../slice 16
> >     avg slice:  1.12 utime: 216263.187500
> >     avg slice:  0.25 utime: 125507.687500
> >     avg slice:  0.31 utime: 125257.937500
> >     avg slice:  0.31 utime: 125507.812500
> >     avg slice:  0.12 utime: 124507.875000
> >     avg slice:  0.38 utime: 124757.687500
> >     avg slice:  0.31 utime: 125508.000000
> >     avg slice:  0.44 utime: 125757.750000
> >     avg slice:  2.00 utime: 128258.000000
> >  ------ here I turned off new_fair_sleepers ----
> >     avg slice: 10.25 utime: 137008.500000
> >     avg slice:  9.31 utime: 139008.875000
> >     avg slice: 10.50 utime: 141508.687500
> >     avg slice:  9.44 utime: 139258.750000
> >     avg slice: 10.31 utime: 140008.687500
> >     avg slice:  9.19 utime: 139008.625000
> >     avg slice: 10.00 utime: 137258.625000
> >     avg slice: 10.06 utime: 135258.562500
> >     avg slice:  9.62 utime: 138758.562500
> > 
> > As you can see, the average time slice is *extremely* low with 
> > new fair sleepers enabled. Turning it off, we get ~10ms time 
> > slices, and a performance that is roughly 10% higher. It looks 
> > like this kind of "silly time slice syndrome" is what is really 
> > eating performance here.
> > 
> > After staring at place_entity for a while, and by watching the 
> > process' vruntime for a while, I think what's happening is this.
> > 
> > With fair sleepers turned off, a process that just got woken up 
> > will get the vruntime of the process that's leftmost in the 
> > rbtree, and will thus be placed to the right of the current 
> > task.
> > 
> > However, with fair_sleepers enabled, a newly woken up process 
> > will retain its old vruntime as long as it's less than 
> > sched_latency in the past, and thus it will be placed to the 
> > very left in the rbtree. Since a task that is mostly sleeping 
> > will never accrue vruntime at the same rate a cpu-bound task 
> > does, it will always preempt any running task almost immediately 
> > after it's scheduled.
> > 
> > Does this make sense?
> 
> Yep, you got it right.
> 
> > Any insight you can offer here is greatly appreciated!
> 
> There's a class of applications and benchmarks that rather likes 
> this behaviour, particularly those that favour timely delivery of 
> signals and other wakeup driven thingies.

Yes.

Firstly, thanks Olaf for measuring and analyzing this so carefully. 
>From your description i get the impression that you are trying to 
maximize throughput for benchmarks - AIM9 and dbench live and die on 
the ability to batch the workload.

If that is indeed the goal, could you try your measurement with 
SCHED_BATCH enabled for all relevant (server and client) tasks? 
SCHED_BATCH is a hint to the scheduler that the workload does not 
care about wakeup latecy. Do you still get the new-fair-sleepers 
sensitivity in that case?

The easiest way to se SCHED_BATCH is to do this in a shell:

  schedtool -B $$

and then restart all server tasks ('service mysqld restart' for 
example) and start the benchmark - all child tasks will have 
SCHED_BATCH set.

<plug>

Also, analysis of such problems can generally be done faster and 
more accuately (and the results are more convincing) if you use 
perfcounters. It's very easy to set it up, as long as you have Core2 
(or later) Intel CPUs or AMD CPUs. Pull this tree:

git pull \
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git \
  perfcounters/core

No configuration needed - build it (accept the defaults), and boot 
it and do:

  cd Documentation/perf_counter/
  make -j

and you are set. You can try:

  ./perf stat -a sleep 10

to get a 10 seconds snapshot of what's going on in the system:

 aldebaran:~> perf stat -a sleep 10

 Performance counter stats for 'sleep':

  159827.019426  task clock ticks     (msecs)
           1274  context switches     #        0.000 M/sec
             78  CPU migrations       #        0.000 M/sec
           7777  pagefaults           #        0.000 M/sec
     2236492601  CPU cycles           #       13.993 M/sec
     1908732654  instructions         #       11.942 M/sec
        5059857  cache references     #        0.032 M/sec
         503188  cache misses         #        0.003 M/sec

 Wall-clock time elapsed: 10008.471596 msecs

You can also do 'perf stat dbench 10' type of measurements to only 
measure that particular workload. In particular the context switch 
rate, the cache-miss rate and the ratio between task-clock-ticks and 
wall-clock-time (efficiency parallelization) can be pretty telling 
about how 'healthy' a benchmark is - and how various tunables (such 
as new-fair-sleepers) affect it.

Say 'hackbench 20' gives this output on my a testbox:

 aldebaran:~> perf stat ./hackbench 20
 Time: 0.882

 Performance counter stats for './hackbench':

   12117.839049  task clock ticks     (msecs)
          63444  context switches     #        0.005 M/sec
           6972  CPU migrations       #        0.001 M/sec
          35356  pagefaults           #        0.003 M/sec
    34561571429  CPU cycles           #     2852.123 M/sec
    26980655850  instructions         #     2226.524 M/sec
      115693364  cache references     #        9.547 M/sec
       57710692  cache misses         #        4.762 M/sec

 Wall-clock time elapsed:   996.284274 msecs

This is a 2.8 GHz CPU, and the cycles/sec value of 2.852 shows that 
the workload is running at max CPU speed. 2.2 billion instructions 
per second is OK-ish. Cache-miss rate is a bit high - but that's not 
unexpected: hackbench 20 runs 400 tasks in parallel.

perf stat results become even more interesting when they are 
compared. You might want to give it a try.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-05-30 11:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-28 13:02 CFS Performance Issues Olaf Kirch
2009-05-28 15:49 ` David Newall
2009-05-28 18:20   ` Olaf Kirch
2009-05-28 18:43     ` David Newall
2009-05-28 20:31 ` Peter Zijlstra
2009-05-30 11:18   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox