public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>, Balazs Scheidler <bazsi@balabit.hu>,
	linux-kernel@vger.kernel.org, Willy Tarreau <w@1wt.eu>
Subject: Re: [patch] Re: scheduler oddity [bug?]
Date: Mon, 09 Mar 2009 16:30:49 +0100	[thread overview]
Message-ID: <1236612649.6019.38.camel@marge.simson.net> (raw)
In-Reply-To: <1236609711.8389.583.camel@laptop>

On Mon, 2009-03-09 at 15:41 +0100, Peter Zijlstra wrote:
> On Mon, 2009-03-09 at 15:11 +0100, Mike Galbraith wrote:
> 
> > > Yes 2* worked fine.  Mysql+oltp was my worry spot, being a very affinity
> > > sensitive little <bleep>, but my patchlet didn't cause any trouble, so
> > > this one shouldn't either.  I'll do some re-test in any case, and squeak
> > > should anything turn up.
> > 
> > Squeak!  Didn't even get to mysql+oltp.
> > 
> > marge:..local/tmp # netperf -t UDP_STREAM -l 60 -H 127.0.0.1  -- -P 15888,12384 -s 32768 -S 32768 -m 4096
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15888 AF_INET to 127.0.0.1 (127.0.0.1) port 12384 AF_INET
> > Socket  Message  Elapsed      Messages
> > Size    Size     Time         Okay Errors   Throughput
> > bytes   bytes    secs            #      #   10^6bits/sec
> > 
> >  65536    4096   60.00     5161103      0    2818.65
> >  65536           60.00     5149666           2812.40
> > 
> >  6188 root      20   0  1040  544  324 R  100  0.0   0:31.49 0 netperf
> >  6189 root      20   0  1044  260  164 R   48  0.0   0:15.35 3 netserver
> > 
> > Hurt, pain, ouch, vs...
> > 
> > marge:..local/tmp # netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -T 0,0 -- -P 15888,12384 -s 32768 -S 32768 -m 4096
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15888 AF_INET to 127.0.0.1 (127.0.0.1) port 12384 AF_INET : cpu bind
> > Socket  Message  Elapsed      Messages
> > Size    Size     Time         Okay Errors   Throughput
> > bytes   bytes    secs            #      #   10^6bits/sec
> > 
> >  65536    4096   60.00     8452028      0    4615.93
> >  65536           60.00     8442945           4610.97
> > 
> > Drat.
> 
> Bugger, so back to the drawing board it is...

Hm.

CPU utilization wise, this test is similar to pipetest.  The major
difference is chunk size.  Netperf is waking and being preempted (if on
the same CPU) at a very high rate, so the hog component gets cpu in tiny
chunks, vs hefty chunks for pipetest.

Simply doing the below (will look very familiar) made both netperf and
pipetest happy again, because of that preemption rate.  Both start life
wanting to be affine, and due to the switch rate, pipetest becomes
non-affine, but netperf remains affine.

Maybe we should factor in wakeup rate, and whether we're waking many vs
one.  Wakeup is tied to data, so there is correlation to potential
cache-miss pain, no?

There is also evidence that your patch did in fact make the right
decision, but that we really REALLY should try to punt to a CPU that
shares a cache if available.  Check out the numbers when the netperf
test runs on two CPUs that share cache.

marge:..local/tmp # netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -T 0,1 -- -P 15888,12384 -s 32768 -S 32768 -m 4096
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 15888 AF_INET to 127.0.0.1 (127.0.0.1) port 12384 AF_INET : cpu bind
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

 65536    4096   60.00     15325632      0    8369.84
 65536           60.00     15321176           8367.40

(You can skip the below, nothing new there.  Just for completeness;)

diff --git a/kernel/sched.c b/kernel/sched.c
index 8e2558c..0f67b2a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4508,6 +4508,24 @@ static inline void schedule_debug(struct task_struct *prev)
 #endif
 }
 
+static void put_prev_task(struct rq *rq, struct task_struct *prev)
+{
+	if (prev->state == TASK_RUNNING) {
+		u64 runtime = prev->se.sum_exec_runtime;
+
+		runtime -= prev->se.prev_sum_exec_runtime;
+		runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
+
+		/*
+		 * In order to avoid avg_overlap growing stale when we are
+		 * indeed overlapping and hence not getting put to sleep, grow
+		 * the avg_overlap on preemption.
+		 */
+		update_avg(&prev->se.avg_overlap, runtime);
+	}
+	prev->sched_class->put_prev_task(rq, prev);
+}
+
 /*
  * Pick up the highest-prio task:
  */
@@ -4586,7 +4604,7 @@ need_resched_nonpreemptible:
 	if (unlikely(!rq->nr_running))
 		idle_balance(cpu, rq);
 
-	prev->sched_class->put_prev_task(rq, prev);
+	put_prev_task(rq, prev);
 	next = pick_next_task(rq, prev);
 
 	if (likely(prev != next)) {



  reply	other threads:[~2009-03-09 15:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-07 17:47 scheduler oddity [bug?] Balazs Scheidler
2009-03-07 18:47 ` Balazs Scheidler
2009-03-08 19:45   ` Balazs Scheidler
2009-03-08 22:03     ` Willy Tarreau
2009-03-09  3:35       ` Mike Galbraith
2009-03-09 11:19     ` David Newall
2009-03-08  9:42 ` Mike Galbraith
2009-03-08  9:58   ` Mike Galbraith
2009-03-08 10:02     ` Mike Galbraith
2009-03-08 10:19     ` Peter Zijlstra
2009-03-08 13:35       ` Mike Galbraith
2009-03-08 15:39     ` Ingo Molnar
2009-03-08 16:20       ` Mike Galbraith
2009-03-08 17:52         ` Ingo Molnar
2009-03-08 18:39           ` Mike Galbraith
2009-03-08 18:55             ` Ingo Molnar
2009-03-09  4:10               ` Mike Galbraith
2009-03-09  6:52                 ` Ingo Molnar
2009-03-09  8:02           ` [patch] " Mike Galbraith
2009-03-09  8:07             ` Ingo Molnar
2009-03-09 10:16               ` David Newall
2009-03-09 11:04               ` Peter Zijlstra
2009-03-09 13:16                 ` Mike Galbraith
2009-03-09 13:27                   ` Peter Zijlstra
2009-03-09 13:51                     ` Mike Galbraith
2009-03-09 14:00                     ` David Newall
2009-03-09 14:19                       ` Peter Zijlstra
2009-03-10  0:20                         ` David Newall
2009-03-09 13:37                   ` Mike Galbraith
2009-03-09 13:46                     ` Peter Zijlstra
2009-03-09 13:58                       ` Mike Galbraith
2009-03-09 14:11                         ` Mike Galbraith
2009-03-09 14:41                           ` Peter Zijlstra
2009-03-09 15:30                             ` Mike Galbraith [this message]
2009-03-09 16:12                               ` Peter Zijlstra
2009-03-09 17:28                                 ` Mike Galbraith
2009-03-15 13:53                                   ` Balazs Scheidler
2009-03-15 17:16                                     ` Mike Galbraith
2009-03-15 18:57                                       ` Ingo Molnar
2009-03-16 11:55                                         ` Balazs Scheidler
2009-03-09 15:57             ` Balazs Scheidler
2009-03-10  3:16               ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1236612649.6019.38.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bazsi@balabit.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox