From: Peter Zijlstra <peterz@infradead.org>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
"Ma, Ling" <ling.ma@intel.com>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
ego@in.ibm.com, svaidy@linux.vnet.ibm.com
Subject: Re: [patch] sched: fix SMT scheduler regression in find_busiest_queue()
Date: Mon, 15 Feb 2010 23:29:33 +0100 [thread overview]
Message-ID: <1266272973.3650.101.camel@laptop> (raw)
In-Reply-To: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
On Fri, 2010-02-12 at 17:14 -0800, Suresh Siddha wrote:
> From: Suresh Siddha <suresh.b.siddha@intel.com>
> Subject: sched: fix SMT scheduler regression in find_busiest_queue()
>
> Fix a SMT scheduler performance regression that is leading to a scenario
> where SMT threads in one core are completely idle while both the SMT threads
> in another core (on the same socket) are busy.
>
> This is caused by this commit (with the problematic code highlighted)
>
> commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Tue Sep 1 10:34:38 2009 +0200
>
> sched: Try to deal with low capacity
>
> @@ -4203,15 +4223,18 @@ find_busiest_queue()
> ...
> for_each_cpu(i, sched_group_cpus(group)) {
> + unsigned long power = power_of(i);
>
> ...
>
> - wl = weighted_cpuload(i);
> + wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
> + wl /= power;
>
> - if (rq->nr_running == 1 && wl > imbalance)
> + if (capacity && rq->nr_running == 1 && wl > imbalance)
> continue;
>
> On a SMT system, power of the HT logical cpu will be 589 and
> the scheduler load imbalance (for scenarios like the one mentioned above)
> can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
> the weighted load with the power will result in "wl > imbalance" and
> ultimately resulting in find_busiest_queue() return NULL, causing
> load_balance() to think that the load is well balanced. But infact
> one of the tasks can be moved to the idle core for optimal performance.
>
> We don't need to use the weighted load (wl) scaled by the cpu power to
> compare with imabalance. In that condition, we already know there is only a
> single task "rq->nr_running == 1" and the comparison between imbalance,
> wl is to make sure that we select the correct priority thread which matches
> imbalance. So we really need to compare the imabalnce with the original
> weighted load of the cpu and not the scaled load.
>
> But in other conditions where we want the most hammered(busiest) cpu, we can
> use scaled load to ensure that we consider the cpu power in addition to the
> actual load on that cpu, so that we can move the load away from the
> guy that is getting most hammered with respect to the actual capacity,
> as compared with the rest of the cpu's in that busiest group.
>
> Fix it.
>
> Reported-by: Ma Ling <ling.ma@intel.com>
> Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: stable@kernel.org [2.6.32.x]
A reproduction case would have been nice, I've been playing with busy
loops and plotting the cpus on paper, but I didn't manage to reproduce.
Still, I went through the logic and it seems to make sense, so:
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Ingo, sed -e 's/sched\.c/sched_fair.c/g', makes it apply to tip/master
and should provide means of solving the rebase/merge conflict.
> ---
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3a8fb30..bef5369 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4119,12 +4119,23 @@ find_busiest_queue(struct sched_group *group, enum cpu_idle_type idle,
> continue;
>
> rq = cpu_rq(i);
> - wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
> - wl /= power;
> + wl = weighted_cpuload(i);
>
> + /*
> + * When comparing with imbalance, use weighted_cpuload()
> + * which is not scaled with the cpu power.
> + */
> if (capacity && rq->nr_running == 1 && wl > imbalance)
> continue;
>
> + /*
> + * For the load comparisons with the other cpu's, consider
> + * the weighted_cpuload() scaled with the cpu power, so that
> + * the load can be moved away from the cpu that is potentially
> + * running at a lower capacity.
> + */
> + wl = (wl * SCHED_LOAD_SCALE) / power;
> +
> if (wl > max_load) {
> max_load = wl;
> busiest = rq;
>
>
next prev parent reply other threads:[~2010-02-15 22:29 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-13 1:14 [patch] sched: fix SMT scheduler regression in find_busiest_queue() Suresh Siddha
2010-02-13 1:31 ` change in sched cpu_power causing regressions with SCHED_MC Suresh Siddha
2010-02-13 10:36 ` Peter Zijlstra
2010-02-13 10:42 ` Peter Zijlstra
2010-02-13 18:37 ` Vaidyanathan Srinivasan
2010-02-13 18:49 ` Suresh Siddha
2010-02-13 18:39 ` Vaidyanathan Srinivasan
2010-02-19 2:16 ` Suresh Siddha
2010-02-19 12:32 ` Arun R Bharadwaj
2010-02-19 13:03 ` Vaidyanathan Srinivasan
2010-02-19 19:15 ` Suresh Siddha
2010-02-19 14:05 ` Peter Zijlstra
2010-02-19 18:36 ` Suresh Siddha
2010-02-19 19:47 ` Peter Zijlstra
2010-02-19 19:50 ` Suresh Siddha
2010-02-19 20:02 ` Peter Zijlstra
2010-02-20 1:13 ` Suresh Siddha
2010-02-22 18:50 ` Peter Zijlstra
2010-02-24 0:13 ` Suresh Siddha
2010-02-24 17:43 ` Peter Zijlstra
2010-02-24 19:31 ` Suresh Siddha
2010-02-26 10:24 ` [tip:sched/core] sched: Fix SCHED_MC regression caused by change in sched cpu_power tip-bot for Suresh Siddha
2010-02-26 14:55 ` tip-bot for Suresh Siddha
2010-02-19 19:52 ` change in sched cpu_power causing regressions with SCHED_MC Peter Zijlstra
2010-02-13 18:33 ` Vaidyanathan Srinivasan
2010-02-13 18:27 ` [patch] sched: fix SMT scheduler regression in find_busiest_queue() Vaidyanathan Srinivasan
2010-02-13 18:39 ` Suresh Siddha
2010-02-13 18:56 ` Vaidyanathan Srinivasan
2010-02-13 20:25 ` Vaidyanathan Srinivasan
2010-02-13 20:36 ` Vaidyanathan Srinivasan
2010-02-14 10:11 ` Peter Zijlstra
2010-02-15 12:35 ` Vaidyanathan Srinivasan
2010-02-15 13:00 ` Peter Zijlstra
2010-02-16 15:59 ` Vaidyanathan Srinivasan
2010-02-16 17:28 ` Peter Zijlstra
2010-02-16 18:25 ` Vaidyanathan Srinivasan
2010-02-16 18:46 ` Vaidyanathan Srinivasan
2010-02-16 18:48 ` Peter Zijlstra
2010-02-15 22:29 ` Peter Zijlstra [this message]
2010-02-16 14:16 ` [tip:sched/urgent] sched: Fix " tip-bot for Suresh Siddha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1266272973.3650.101.camel@laptop \
--to=peterz@infradead.org \
--cc=ego@in.ibm.com \
--cc=ling.ma@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.