From: Peter Zijlstra <peterz@infradead.org>
To: Olaf Kirch <okir@suse.de>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
Andreas Gruenbacher <agruen@suse.de>,
Mike Galbraith <efault@gmx.de>
Subject: Re: CFS Performance Issues
Date: Thu, 28 May 2009 22:31:18 +0200 [thread overview]
Message-ID: <1243542678.6645.101.camel@laptop> (raw)
In-Reply-To: <200905281502.22487.okir@suse.de>
On Thu, 2009-05-28 at 15:02 +0200, Olaf Kirch wrote:
> Hi Ingo,
>
> As you probably know, we've been chasing a variety of performance issues
> on our SLE11 kernel, and one of the suspects has been CFS for quite a
> while. The benchmarks that pointed to CFS include AIM7, dbench, and a few
> others, but the picture has been a bit hazy as to what is really the problem here.
>
> Now IBM recently told us they had played around with some scheduler
> tunables and found that by turning off NEW_FAIR_SCHEDULERS, they
> could make the regression on a compute benchmark go away completely.
> We're currently working on rerunning other benchmarks with NEW_FAIR_SLEEPERS
> turned off to see whether it has an impact on these as well.
>
> Of course, the first question we asked ourselves was, how can NEW_FAIR_SLEEPERS
> affect a benchmark that rarely sleeps, or not at all?
>
> The answer was, it's not affecting the benchmark processes, but some noise
> going on in the background. When I was first able to reproduce this on my work
> station, it was knotify4 running in the background - using hardly any CPU, but
> getting woken up ~1000 times a second. Don't ask me what it's doing :-)
>
> So I sat down and reproduced this; the most recent iteration of the test program
> is courtesy of Andreas Gruenbacher (see below).
>
> This program spawns a number of processes that just spin in a loop. It also spawns
> a single process that wakes up 1000 times a second. Every second, it computes the
> average time slice per process (utime / number of involuntary context switches),
> and prints out the overall average time slice and average utime.
>
> While running this program, you can conveniently enable or disable fair sleepers.
> When I do this on my test machine (no desktop in the background this time :-)
> I see this:
>
> ../slice 16
> avg slice: 1.12 utime: 216263.187500
> avg slice: 0.25 utime: 125507.687500
> avg slice: 0.31 utime: 125257.937500
> avg slice: 0.31 utime: 125507.812500
> avg slice: 0.12 utime: 124507.875000
> avg slice: 0.38 utime: 124757.687500
> avg slice: 0.31 utime: 125508.000000
> avg slice: 0.44 utime: 125757.750000
> avg slice: 2.00 utime: 128258.000000
> ------ here I turned off new_fair_sleepers ----
> avg slice: 10.25 utime: 137008.500000
> avg slice: 9.31 utime: 139008.875000
> avg slice: 10.50 utime: 141508.687500
> avg slice: 9.44 utime: 139258.750000
> avg slice: 10.31 utime: 140008.687500
> avg slice: 9.19 utime: 139008.625000
> avg slice: 10.00 utime: 137258.625000
> avg slice: 10.06 utime: 135258.562500
> avg slice: 9.62 utime: 138758.562500
>
> As you can see, the average time slice is *extremely* low with new fair
> sleepers enabled. Turning it off, we get ~10ms time slices, and a
> performance that is roughly 10% higher. It looks like this kind of
> "silly time slice syndrome" is what is really eating performance here.
>
> After staring at place_entity for a while, and by watching the process'
> vruntime for a while, I think what's happening is this.
>
> With fair sleepers turned off, a process that just got woken up will
> get the vruntime of the process that's leftmost in the rbtree, and will
> thus be placed to the right of the current task.
>
> However, with fair_sleepers enabled, a newly woken up process
> will retain its old vruntime as long as it's less than sched_latency
> in the past, and thus it will be placed to the very left in the rbtree.
> Since a task that is mostly sleeping will never accrue vruntime at
> the same rate a cpu-bound task does, it will always preempt any
> running task almost immediately after it's scheduled.
>
> Does this make sense?
Yep, you got it right.
> Any insight you can offer here is greatly appreciated!
There's a class of applications and benchmarks that rather likes this
behaviour, particularly those that favour timely delivery of signals and
other wakeup driven thingies.
next prev parent reply other threads:[~2009-05-28 20:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-28 13:02 CFS Performance Issues Olaf Kirch
2009-05-28 15:49 ` David Newall
2009-05-28 18:20 ` Olaf Kirch
2009-05-28 18:43 ` David Newall
2009-05-28 20:31 ` Peter Zijlstra [this message]
2009-05-30 11:18 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1243542678.6645.101.camel@laptop \
--to=peterz@infradead.org \
--cc=agruen@suse.de \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=okir@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox