public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Balazs Scheidler <bazsi@balabit.hu>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: scheduler oddity [bug?]
Date: Sun, 08 Mar 2009 11:02:02 +0100	[thread overview]
Message-ID: <1236506522.6972.13.camel@marge.simson.net> (raw)
In-Reply-To: <1236506309.6972.8.camel@marge.simson.net>

On Sun, 2009-03-08 at 10:58 +0100, Mike Galbraith wrote:
> On Sun, 2009-03-08 at 10:42 +0100, Mike Galbraith wrote:
> > On Sat, 2009-03-07 at 18:47 +0100, Balazs Scheidler wrote:
> > > Hi,
> > > 
> > > I'm experiencing an odd behaviour from the Linux scheduler. I have an
> > > application that feeds data to another process using a pipe. Both
> > > processes use a fair amount of CPU time apart from writing to/reading
> > > from this pipe.
> > > 
> > > The machine I'm running on  is an Opteron Quad-Core CPU:
> > > model name	: Quad-Core AMD Opteron(tm) Processor 2347 HE
> > > stepping	: 3
> > > 
> > > What I see is that only one of the cores is used, the other three is
> > > idling without doing any work. If I explicitly set the CPU affinity of
> > > the processes to use distinct CPUs the performance goes up
> > > significantly. (e.g. it starts to use the other cores and the load
> > > scales linearly).
> > > 
> > > I've tried to reproduce the problem by writing a small test program,
> > > which you can find attached. The program creates two processes, one
> > > feeds the other using a pipe and each does a series of memset() calls to
> > > simulate CPU load. I've also added capability to the program to set its
> > > own CPU affinity. The results (the more the better):
> > > 
> > > Without enabling CPU affinity:
> > > $ ./a.out
> > > Check: 0 loops/sec, sum: 1 
> > > Check: 12 loops/sec, sum: 13 
> > > Check: 41 loops/sec, sum: 54 
> > > Check: 41 loops/sec, sum: 95 
> > > Check: 41 loops/sec, sum: 136 
> > > Check: 41 loops/sec, sum: 177 
> > > Check: 41 loops/sec, sum: 218 
> > > Check: 40 loops/sec, sum: 258 
> > > Check: 41 loops/sec, sum: 299 
> > > Check: 41 loops/sec, sum: 340 
> > > Check: 41 loops/sec, sum: 381 
> > > Check: 41 loops/sec, sum: 422 
> > > Check: 41 loops/sec, sum: 463 
> > > Check: 41 loops/sec, sum: 504 
> > > Check: 41 loops/sec, sum: 545 
> > > Check: 40 loops/sec, sum: 585 
> > > Check: 41 loops/sec, sum: 626 
> > > Check: 41 loops/sec, sum: 667 
> > > Check: 41 loops/sec, sum: 708 
> > > Check: 41 loops/sec, sum: 749 
> > > Check: 41 loops/sec, sum: 790 
> > > Check: 41 loops/sec, sum: 831 
> > > Final: 39 loops/sec, sum: 831
> > > 
> > > 
> > > With CPU affinity:
> > > # ./a.out 1
> > > Check: 0 loops/sec, sum: 1 
> > > Check: 41 loops/sec, sum: 42 
> > > Check: 49 loops/sec, sum: 91 
> > > Check: 49 loops/sec, sum: 140 
> > > Check: 49 loops/sec, sum: 189 
> > > Check: 49 loops/sec, sum: 238 
> > > Check: 49 loops/sec, sum: 287 
> > > Check: 50 loops/sec, sum: 337 
> > > Check: 49 loops/sec, sum: 386 
> > > Check: 49 loops/sec, sum: 435 
> > > Check: 49 loops/sec, sum: 484 
> > > Check: 49 loops/sec, sum: 533 
> > > Check: 49 loops/sec, sum: 582 
> > > Check: 49 loops/sec, sum: 631 
> > > Check: 49 loops/sec, sum: 680 
> > > Check: 49 loops/sec, sum: 729 
> > > Check: 49 loops/sec, sum: 778 
> > > Check: 49 loops/sec, sum: 827 
> > > Check: 49 loops/sec, sum: 876 
> > > Check: 49 loops/sec, sum: 925 
> > > Check: 50 loops/sec, sum: 975 
> > > Check: 49 loops/sec, sum: 1024 
> > > Final: 48 loops/sec, sum: 1024
> > > 
> > > The difference is about 20%, which is about the same work performed by
> > > the slave process. If the two processes race for the same CPU this 20%
> > > of performance is lost.
> > > 
> > > I've tested this on 3 computers and each showed the same symptoms:
> > >  * quad core Opteron, running Ubuntu kernel 2.6.27-13.29
> > >  * Core 2 Duo, running Ubuntu kernel 2.6.27-11.27
> > >  * Dual Core Opteron, Debian backports.org kernel 2.6.26-13~bpo40+1
> > > 
> > > Is this a bug, or a feature?
> > 
> > Both.  Affine wakeups are cache friendly, and generally a feature, but
> > can lead to underutilized CPUs in some cases, thus turning feature into
> > bug as your testcase demonstrates.  The metric we for the affinity hint
> > works well, but clearly wants some refinement.
> > 
> > You can turn this scheduler hint off via:
> > 	echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features
> > 
> 

(reply got munged)

> The problem with your particular testcase is that while one half has an
> avg_overlap (what we use as affinity hint for synchronous wakeups) which
> triggers the affinity hint, the other half has avg_overlap of zero, what
> it was born with, so despite significant execution overlap, the
> scheduler treats them as if they were truly synchronous tasks.
> 
> The below cures it, but is only a demo hack.

diff --git a/kernel/sched.c b/kernel/sched.c
index 8e2558c..85f9ced 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1712,11 +1712,15 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	u64 limit = sysctl_sched_migration_cost;
+	u64 runtime = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime;
+
 	if (sleep && p->se.last_wakeup) {
 		update_avg(&p->se.avg_overlap,
 			   p->se.sum_exec_runtime - p->se.last_wakeup);
 		p->se.last_wakeup = 0;
-	}
+	} else if (p->se.avg_overlap < limit && runtime >= limit)
+		update_avg(&p->se.avg_overlap, runtime);
 
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);

pipetest (6701, #threads: 1)
---------------------------------------------------------
se.exec_start                      :       5607096.896687
se.vruntime                        :        274158.274352
se.sum_exec_runtime                :        139434.783417
se.avg_overlap                     :             6.477067 <== was zero
nr_switches                        :                 2246
nr_voluntary_switches              :                    1
nr_involuntary_switches            :                 2245
se.load.weight                     :                 1024
policy                             :                    0
prio                               :                  120
clock-delta                        :                  102

pipetest (6702, #threads: 1)
---------------------------------------------------------
se.exec_start                      :       5607096.896687
se.vruntime                        :        274098.273516
se.sum_exec_runtime                :         32987.899515
se.avg_overlap                     :             0.502174 <== was always < migration cost
nr_switches                        :                13631
nr_voluntary_switches              :                11639
nr_involuntary_switches            :                 1992
se.load.weight                     :                 1024
policy                             :                    0
prio                               :                  120
clock-delta                        :                  117



  reply	other threads:[~2009-03-08 10:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-07 17:47 scheduler oddity [bug?] Balazs Scheidler
2009-03-07 18:47 ` Balazs Scheidler
2009-03-08 19:45   ` Balazs Scheidler
2009-03-08 22:03     ` Willy Tarreau
2009-03-09  3:35       ` Mike Galbraith
2009-03-09 11:19     ` David Newall
2009-03-08  9:42 ` Mike Galbraith
2009-03-08  9:58   ` Mike Galbraith
2009-03-08 10:02     ` Mike Galbraith [this message]
2009-03-08 10:19     ` Peter Zijlstra
2009-03-08 13:35       ` Mike Galbraith
2009-03-08 15:39     ` Ingo Molnar
2009-03-08 16:20       ` Mike Galbraith
2009-03-08 17:52         ` Ingo Molnar
2009-03-08 18:39           ` Mike Galbraith
2009-03-08 18:55             ` Ingo Molnar
2009-03-09  4:10               ` Mike Galbraith
2009-03-09  6:52                 ` Ingo Molnar
2009-03-09  8:02           ` [patch] " Mike Galbraith
2009-03-09  8:07             ` Ingo Molnar
2009-03-09 10:16               ` David Newall
2009-03-09 11:04               ` Peter Zijlstra
2009-03-09 13:16                 ` Mike Galbraith
2009-03-09 13:27                   ` Peter Zijlstra
2009-03-09 13:51                     ` Mike Galbraith
2009-03-09 14:00                     ` David Newall
2009-03-09 14:19                       ` Peter Zijlstra
2009-03-10  0:20                         ` David Newall
2009-03-09 13:37                   ` Mike Galbraith
2009-03-09 13:46                     ` Peter Zijlstra
2009-03-09 13:58                       ` Mike Galbraith
2009-03-09 14:11                         ` Mike Galbraith
2009-03-09 14:41                           ` Peter Zijlstra
2009-03-09 15:30                             ` Mike Galbraith
2009-03-09 16:12                               ` Peter Zijlstra
2009-03-09 17:28                                 ` Mike Galbraith
2009-03-15 13:53                                   ` Balazs Scheidler
2009-03-15 17:16                                     ` Mike Galbraith
2009-03-15 18:57                                       ` Ingo Molnar
2009-03-16 11:55                                         ` Balazs Scheidler
2009-03-09 15:57             ` Balazs Scheidler
2009-03-10  3:16               ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1236506522.6972.13.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bazsi@balabit.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox