From: Peter Zijlstra <peterz@infradead.org>
To: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Cc: mingo@kernel.org, LKML <linux-kernel@vger.kernel.org>,
roland@kernel.org, Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
efault@gmx.de, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] sched: balance_cpu to consider other cpus in its group as target of (pinned) task migration
Date: Mon, 04 Jun 2012 11:00:54 +0200 [thread overview]
Message-ID: <1338800454.2448.71.camel@twins> (raw)
In-Reply-To: <4FCC4E3B.4090209@linux.vnet.ibm.com>
On Mon, 2012-06-04 at 11:27 +0530, Prashanth Nageshappa wrote:
> Based on the description in
> http://marc.info/?l=linux-kernel&m=133108682113018&w=2 , I was able to recreate
> a problem where in a SCHED_OTHER thread never gets runtime, even though there is
> one allowed CPU where it can run and make progress.
Do not use external references in changelogs if you don't absolutely
need to.
> On a dual socket box (4 cores per socket, 2 threads per core) with following
> config:
> 0 8 1 9 4 12 5 13
> 2 10 3 11 6 14 7 15
> |__________| |__________|
> socket 1 socket 2
>
> If we have following 4 tasks (2 SCHED_FIFO and 2 SCHED_OTHER) started in the
> following order:
> 1> SCHED_FIFO cpu hogging task bound to cpu 1
> 2> SCHED_OTHER cpu hogging task bound to cpus 3 & 9 - running on cpu 3
> sleeps and wakes up after all other tasks are started
> 3> SCHED_FIFO cpu hogging task bound to cpu 3
> 4> SCHED_OTHER cpu hogging task bound to cpu 9
>
> Once all the 4 tasks are started, we observe that 2nd task is starved of CPU
> after waking up. When it wakes up, it wakes up on its prev_cpu (3) where
> a FIFO task is now hogging the cpu. To prevent starvation, 2nd task
> needs to be pulled to cpu 9. However, between cpus 1, 9, cpu1 is the chosen
> cpu that attempts pulling tasks towards its core. When it tries pulling
> 2nd tasks towards its core, it is unable to do so as cpu1 is not in 2nd
> task's cpus_allowed mask. Ideally cpu1 should note that the task can be
> moved to its sibling and trigger that movement.
>
> In this patch, we try to identify if load balancing goal was not achieved
> completely because of destination cpu not being in cpus_allowed mask of target
> task(s) and retry load balancing to try and move tasks to other cpus in the
> same sched_group as that of destination cpu.
Either expand a little more on the implementation or preferably add some
comments.
> Tested on tip commit cca44889.
This is not a sane addition to the changelog.
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Did vatsa write this patch? If so, you forgot a From header, if not,
wtf!?
> Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
>
> ----
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index de49ed5..da275d8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3098,6 +3098,7 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10;
>
> #define LBF_ALL_PINNED 0x01
> #define LBF_NEED_BREAK 0x02
> +#define LBF_NEW_DST_CPU 0x04
>
> struct lb_env {
> struct sched_domain *sd;
> @@ -3108,6 +3109,8 @@ struct lb_env {
> int dst_cpu;
> struct rq *dst_rq;
>
> + struct cpumask *dst_grpmask;
> + int new_dst_cpu;
> enum cpu_idle_type idle;
> long imbalance;
> unsigned int flags;
> @@ -3198,7 +3201,25 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> * 3) are cache-hot on their current CPU.
> */
> if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
> - schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> + int new_dst_cpu;
> +
> + if (!env->dst_grpmask) {
> + schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> + return 0;
> + }
> + /*
> + * check if cpus_allowed has any cpus in the same sched_group
> + * as that of dst_cpu and set LBF_NEW_DST_CPU and new_dst_cpu
> + * accordingly
> + */
> + new_dst_cpu = cpumask_first_and(env->dst_grpmask,
> + tsk_cpus_allowed(p));
> + if (new_dst_cpu >= nr_cpu_ids) {
> + schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
> + } else {
> + env->flags |= LBF_NEW_DST_CPU;
> + env->new_dst_cpu = new_dst_cpu;
> + }
Only a flimsy comment on what the code does -- I can read C thank you.
Not a single comment on why it does this.
> return 0;
> }
> env->flags &= ~LBF_ALL_PINNED;
> @@ -4418,7 +4439,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> struct sched_domain *sd, enum cpu_idle_type idle,
> int *balance)
> {
> - int ld_moved, active_balance = 0;
> + int ld_moved, old_ld_moved, active_balance = 0;
> struct sched_group *group;
> struct rq *busiest;
> unsigned long flags;
> @@ -4428,6 +4449,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> .sd = sd,
> .dst_cpu = this_cpu,
> .dst_rq = this_rq,
> + .dst_grpmask = sched_group_cpus(sd->groups),
> .idle = idle,
> .loop_break = sched_nr_migrate_break,
> .find_busiest_queue = find_busiest_queue,
> @@ -4461,6 +4483,7 @@ redo:
> schedstat_add(sd, lb_imbalance[idle], env.imbalance);
>
> ld_moved = 0;
> + old_ld_moved = 0;
> if (busiest->nr_running > 1) {
> /*
> * Attempt to move tasks. If find_busiest_group has found
> @@ -4488,12 +4511,27 @@ more_balance:
> env.flags &= ~LBF_NEED_BREAK;
> goto more_balance;
> }
> -
> /*
> * some other cpu did the load balance for us.
> */
> - if (ld_moved && this_cpu != smp_processor_id())
> - resched_cpu(this_cpu);
> + if ((ld_moved != old_ld_moved) &&
> + (env.dst_cpu != smp_processor_id()))
> + resched_cpu(env.dst_cpu);
The whole old_ld_moved stuff sucks, and preferably needs a rename, or at
least a comment.
> + if ((env.flags & LBF_NEW_DST_CPU) && (env.imbalance > 0)) {
> + /*
> + * we could not balance completely as some tasks
> + * were not allowed to move to the dst_cpu, so try
> + * again with new_dst_cpu.
> + */
> + this_rq = cpu_rq(env.new_dst_cpu);
> + env.dst_rq = this_rq;
> + env.dst_cpu = env.new_dst_cpu;
> + env.flags &= ~LBF_NEW_DST_CPU;
> + env.loop = 0;
> + old_ld_moved = ld_moved;
> + goto more_balance;
> + }
>
OK, so previously we only pulled to ourselves, now you make cpu x move
from cpu y to cpu z. This changes the dynamic of the load-balancer, not
a single word on that and its impact/ramifications.
next prev parent reply other threads:[~2012-06-04 9:01 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-04 5:57 [PATCH] sched: balance_cpu to consider other cpus in its group as target of (pinned) task migration Prashanth Nageshappa
2012-06-04 9:00 ` Peter Zijlstra [this message]
2012-06-04 11:41 ` Srivatsa Vaddagiri
2012-06-04 11:49 ` Peter Zijlstra
2012-06-04 12:27 ` Srivatsa Vaddagiri
2012-06-04 11:51 ` Peter Zijlstra
2012-06-04 9:25 ` Mike Galbraith
2012-06-04 11:53 ` Peter Zijlstra
2012-06-04 12:47 ` Mike Galbraith
2012-06-04 13:07 ` Srivatsa Vaddagiri
2012-06-04 14:30 ` Mike Galbraith
2012-06-04 14:38 ` Srivatsa Vaddagiri
2012-06-04 14:41 ` Mike Galbraith
2012-06-04 15:00 ` Srivatsa Vaddagiri
2012-06-04 15:21 ` Peter Zijlstra
2012-06-04 15:25 ` Srivatsa Vaddagiri
2012-06-04 15:33 ` Peter Zijlstra
2012-06-04 15:46 ` Srivatsa Vaddagiri
2012-06-04 16:56 ` Mike Galbraith
2012-06-04 17:37 ` Srivatsa Vaddagiri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1338800454.2448.71.camel@twins \
--to=peterz@infradead.org \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@kernel.org \
--cc=prashanth@linux.vnet.ibm.com \
--cc=roland@kernel.org \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox