From: Mike Galbraith <efault@gmx.de>
To: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
mingo@elte.hu, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] sched: Disable affine wakeups by default
Date: Sun, 25 Oct 2009 23:04:47 +0100 [thread overview]
Message-ID: <1256508287.17306.14.camel@marge.simson.net> (raw)
In-Reply-To: <20091025123319.2b76bf69@infradead.org>
On Sun, 2009-10-25 at 12:33 -0700, Arjan van de Ven wrote:
> On Sun, 25 Oct 2009 18:38:09 +0100
> Mike Galbraith <efault@gmx.de> wrote:
> > > > Even if you're sharing a cache, there are reasons to wake
> > > > affine. If the wakee can preempt the waker while it's still
> > > > eligible to run, wakee not only eats toasty warm data, it can
> > > > hand the cpu back to the waker so it can make more and repeat
> > > > this procedure for a while without someone else getting in
> > > > between, and trashing cache.
> > >
> > > and on the flipside, and this is the workload I'm looking at,
> > > this is halving your performance roughly due to one core being
> > > totally busy while the other one is idle.
> >
> > Yeah, the "one pgsql+oltp pair" in the numbers I posted show that
> > problem really well. If you can hit an idle shared cache at low load,
> > go for it every time.
>
> sadly the current code does not do this ;(
> my patch might be too big an axe for it, but it does solve this part ;)
The below fixed up pgsql+oltp low end, but has negative effect on high
end. Must be some stuttering going on.
> I'll keep digging to see if we can do a more micro-incursion.
>
> > Hm. That looks like a bug, but after any task has scheduled a few
> > times, if it looks like a synchronous task, it'll glue itself to it's
> > waker's runqueue regardless. Initial wakeup may disperse, but it will
> > come back if it's not overlapping.
>
> the problem is the "synchronous to WHAT" question.
> It may be synchronous to the disk for example; in the testcase I'm
> looking at, we get "send message to X. do some more code. hit a page
> cache miss and do IO" quite a bit.
Hm. Yes, disk could be problematic. It's going to be exactly what the
affinity code looks for, you wake somebody, and almost immediately go to
sleep. OTOH, even a house keeper threads make warm data.
> > > The numbers you posted are for a database, and only measure
> > > throughput. There's more to the world than just databases /
> > > throughput-only computing, and I'm trying to find low impact ways
> > > to reduce the latency aspect of things. One obvious candidate is
> > > hyperthreading/SMT where it IS basically free to switch to a
> > > sibbling, so wake-affine does not really make sense there.
> >
> > It's also almost free on my Q6600 if we aimed for idle shared cache.
>
> yeah multicore with shared cache falls for me in the same bucket.
Anyone with a non-shared cache multicore would be most unhappy with my
little test hack.
> > I agree fully that affinity decisions could be more perfect than they
> > are. Getting it wrong is very expensive either way.
>
> Looks like we agree on a key principle:
> If there is a free cpu "close enough" (SMT or MC basically), the
> wakee should just run on that.
>
> we may not agree on what to do if there's no completely free logical
> cpu, but a much lighter loaded one instead.
> but first we need to let code speak ;)
mysql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47 3x avg
tip+ 10071.16 18498.33 34697.17 34275.20 32761.96 31657.10 30223.70 27363.50 24698.71
9971.57 18290.17 34632.46 34204.59 32588.94 31513.19 30081.51 27504.66 24832.24
9884.04 18502.26 34650.08 34250.13 32707.81 31566.86 29954.19 27417.09 24811.75
pgsql+oltp
clients 1 2 4 8 16 32 64 128 256
tip 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94 3x avg
tip+ 15163.56 28882.70 52374.32 52469.79 51739.79 50602.02 49827.18 48029.84 46191.90
15258.65 28778.77 52716.46 52405.32 51434.21 50440.66 49718.89 48082.22 46124.56
15278.02 28178.55 52815.82 52609.98 51729.17 50652.10 49800.19 48126.95 46286.58
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 37087a7..fa534f0 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1374,6 +1374,8 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
rcu_read_lock();
for_each_domain(cpu, tmp) {
+ int level = tmp->level;
+
/*
* If power savings logic is enabled for a domain, see if we
* are not overloaded, if so, don't balance wider.
@@ -1398,11 +1400,28 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
want_sd = 0;
}
+ /*
+ * look for an idle shared cache before looking at last CPU.
+ */
if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
- cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
+ (level == SD_LV_SIBLING || level == SD_LV_MC)) {
+ int i;
+ for_each_cpu(i, sched_domain_span(tmp)) {
+ if (!cpu_rq(i)->cfs.nr_running) {
+ affine_sd = tmp;
+ want_affine = 0;
+ cpu = i;
+ }
+ }
+ } else if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
+ cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
affine_sd = tmp;
want_affine = 0;
+
+ if ((level == SD_LV_SIBLING || level == SD_LV_MC) &&
+ !cpu_rq(prev_cpu)->cfs.nr_running)
+ cpu = prev_cpu;
}
if (!want_sd && !want_affine)
next prev parent reply other threads:[~2009-10-25 22:04 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-24 19:58 [PATCH 1/3] sched: Enable wake balancing for the SMT/HT domain Arjan van de Ven
2009-10-24 20:04 ` [PATCH 2/3] sched: Add aggressive load balancing for certain situations Arjan van de Ven
2009-10-24 20:07 ` [PATCH 3/3] sched: Disable affine wakeups by default Arjan van de Ven
2009-10-25 6:55 ` Mike Galbraith
2009-10-25 16:51 ` Arjan van de Ven
2009-10-25 17:38 ` Mike Galbraith
2009-10-25 19:33 ` Arjan van de Ven
2009-10-25 22:04 ` Mike Galbraith [this message]
2009-10-26 1:53 ` Peter Zijlstra
2009-10-26 4:38 ` Mike Galbraith
2009-10-26 4:52 ` Arjan van de Ven
2009-10-26 5:08 ` Mike Galbraith
2009-10-26 5:36 ` Arjan van de Ven
2009-10-26 5:47 ` Mike Galbraith
2009-10-26 5:57 ` Mike Galbraith
2009-10-26 7:01 ` Ingo Molnar
2009-10-26 7:05 ` Arjan van de Ven
2009-10-26 11:33 ` Suresh Siddha
2009-11-10 21:59 ` Peter Zijlstra
2009-11-11 6:01 ` Arjan van de Ven
2009-10-27 14:35 ` Mike Galbraith
2009-10-28 7:25 ` Mike Galbraith
2009-10-28 18:36 ` Mike Galbraith
2009-11-04 19:33 ` [tip:sched/core] sched: Check for an idle shared cache in select_task_rq_fair() tip-bot for Mike Galbraith
2009-11-04 20:37 ` Mike Galbraith
2009-11-04 21:41 ` Mike Galbraith
2009-11-05 9:30 ` Ingo Molnar
2009-11-05 9:57 ` Mike Galbraith
2009-11-05 10:00 ` Mike Galbraith
2009-11-06 7:09 ` [tip:sched/core] sched: Fix affinity logic " tip-bot for Mike Galbraith
2009-10-26 5:21 ` [PATCH 3/3] sched: Disable affine wakeups by default Mike Galbraith
2009-10-25 8:01 ` Peter Zijlstra
2009-10-25 8:01 ` [PATCH 2/3] sched: Add aggressive load balancing for certain situations Peter Zijlstra
2009-10-25 11:48 ` Peter Zijlstra
2009-10-25 8:03 ` [PATCH 1/3] sched: Enable wake balancing for the SMT/HT domain Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1256508287.17306.14.camel@marge.simson.net \
--to=efault@gmx.de \
--cc=arjan@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox