Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Shi <alex.shi@intel.com>
To: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"bp@alien8.de" <bp@alien8.de>, "pjt@google.com" <pjt@google.com>,
	"namhyung@kernel.org" <namhyung@kernel.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Date: Wed, 16 Jan 2013 15:41:33 +0800	[thread overview]
Message-ID: <50F659AD.70405@intel.com> (raw)
In-Reply-To: <50F63E16.1060903@intel.com>

On 01/16/2013 01:43 PM, Alex Shi wrote:
> -		/* while loop will break here if sd == NULL */
>>>
>>> I agree that this should be a major optimization. I just can't figure
>>> out why the existing recursive search for an idle cpu switches to the
>>> new cpu near the end and then starts a search for an idle cpu in the new
>>> cpu's domain. Is this to handle some exotic sched domain configurations?
>>> If so, they probably wouldn't work with your optimizations.
>>
>> Let me explain my understanding of why the recursive search is the way
>> it is.
>>
>>  _________________________  sd0
>> |                         |
>> |  ___sd1__   ___sd2__    |
>> | |        | |        |   |
>> | | sgx    | |  sga   |   |
>> | | sgy    | |  sgb   |   |
>> | |________| |________|   |
>> |_________________________|
>>
>> What the current recursive search is doing is (assuming we start with
>> sd0-the top level sched domain whose flags are rightly set). we find
>> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu.
>>
>> We could have ideally stopped the search here.But the problem with this
>> is that there is a possibility that sgx is more loaded than sgy; meaning
>> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and
>> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten
>> idle and load balancing has not come to its rescue yet.According to the
>> search above, cpux1 is idle,but is *not the right candidate for
>> scheduling forked task,it is the right candidate for relieving the load
>> from cpux2* due to cache locality etc.
> 
> The problem still exists on the current code. It still goes to cpux1.
> and then goes up to sgx to seek idlest group ... idlest cpu, and back to
> cpux1 again. nothing help.
> 
> 

to resolve the problem, I has tried to walk domains from top down. but testing
show aim9/hackbench performance is not good on our SNB EP. and no change on other platforms.
---
@@ -3351,51 +3363,33 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 
 
-       while (sd) {
+       for_each_lower_domain(sd) {
                int load_idx = sd->forkexec_idx;
-               struct sched_group *group;
-               int weight;
-
-               if (!(sd->flags & sd_flag)) {
-                       sd = sd->child;
-                       continue;
-               }
+               int local = 0;
 
                if (sd_flag & SD_BALANCE_WAKE)
                        load_idx = sd->wake_idx;
 
-               group = find_idlest_group(sd, p, cpu, load_idx);
-               if (!group) {
-                       sd = sd->child;
-                       continue;
-               }
-
-               new_cpu = find_idlest_cpu(group, p, cpu);
-               if (new_cpu == -1 || new_cpu == cpu) {
-                       /* Now try balancing at a lower domain level of cpu */
-                       sd = sd->child;
+               group = find_idlest_group(sd, p, cpu, load_idx, &local);
+               if (local)
                        continue;
-               }
+               if (!group)
+                       goto unlock;
 
-               /* Now try balancing at a lower domain level of new_cpu */
-               cpu = new_cpu;
-               weight = sd->span_weight;
-               sd = NULL;
-               for_each_domain(cpu, tmp) {
-                       if (weight <= tmp->span_weight)
-                               break;
-                       if (tmp->flags & sd_flag)
+               /* go down from non-local group */
+               for_each_domain(group_first_cpu(group), tmp)
+                       if (cpumask_equal(sched_domain_span(tmp),
+                                               sched_group_cpus(group))) {
                                sd = tmp;
-               }
-               /* while loop will break here if sd == NULL */
+                               break;
+                       }
        }
+       if (group)
+               new_cpu = find_idlest_cpu(group, p, cpu);
 unlock:
        rcu_read_unlock();



-- 
Thanks Alex

next prev parent reply	other threads:[~2013-01-16  7:40 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-05  8:37 [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Alex Shi
2013-01-05  8:37 ` [PATCH v3 01/22] sched: set SD_PREFER_SIBLING on MC domain to reduce a domain level Alex Shi
2013-01-05  8:37 ` [PATCH v3 02/22] sched: select_task_rq_fair clean up Alex Shi
2013-01-11  4:57   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 03/22] sched: fix find_idlest_group mess logical Alex Shi
2013-01-11  4:59   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 04/22] sched: don't need go to smaller sched domain Alex Shi
2013-01-09 17:38   ` Morten Rasmussen
2013-01-10  3:16     ` Mike Galbraith
2013-01-11  5:02   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake Alex Shi
2013-01-09 18:21   ` Morten Rasmussen
2013-01-11  2:46     ` Alex Shi
2013-01-11 10:07       ` Morten Rasmussen
2013-01-11 14:50         ` Alex Shi
2013-01-14  8:55         ` li guang
2013-01-14  9:18           ` Alex Shi
2013-01-11  4:56     ` Preeti U Murthy
2013-01-11  8:01       ` li guang
2013-01-11 14:56         ` Alex Shi
2013-01-14  9:03           ` li guang
2013-01-15  2:34             ` Alex Shi
2013-01-16  1:54               ` li guang
2013-01-11 10:54       ` Morten Rasmussen
2013-01-16  5:43       ` Alex Shi
2013-01-16  7:41         ` Alex Shi [this message]
2013-01-05  8:37 ` [PATCH v3 06/22] sched: load tracking bug fix Alex Shi
2013-01-05  8:37 ` [PATCH v3 07/22] sched: set initial load avg of new forked task Alex Shi
2013-01-11  5:10   ` Preeti U Murthy
2013-01-11  5:44     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 08/22] sched: update cpu load after task_tick Alex Shi
2013-01-05  8:37 ` [PATCH v3 09/22] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-01-05  8:56   ` Alex Shi
2013-01-06  7:54     ` Alex Shi
2013-01-06 18:31       ` Linus Torvalds
2013-01-07  7:00         ` Preeti U Murthy
2013-01-08 14:27         ` Alex Shi
2013-01-11  6:31         ` Alex Shi
2013-01-21 14:47           ` Alex Shi
2013-01-22  3:20             ` Alex Shi
2013-01-22  6:55               ` Mike Galbraith
2013-01-22  7:50                 ` Alex Shi
2013-01-22  9:52                   ` Mike Galbraith
2013-01-23  0:36                     ` Alex Shi
2013-01-23  1:47                       ` Mike Galbraith
2013-01-23  2:01                         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 10/22] sched: consider runnable load average in move_tasks Alex Shi
2013-01-05  8:37 ` [PATCH v3 11/22] sched: consider runnable load average in effective_load Alex Shi
2013-01-10 11:28   ` Morten Rasmussen
2013-01-11  3:26     ` Alex Shi
2013-01-14 12:01       ` Morten Rasmussen
2013-01-16  5:30         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 12/22] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-01-05  8:37 ` [PATCH v3 13/22] sched: add sched_policy in kernel Alex Shi
2013-01-05  8:37 ` [PATCH v3 14/22] sched: add sched_policy and it's sysfs interface Alex Shi
2013-01-14  6:53   ` Namhyung Kim
2013-01-14  8:11     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 15/22] sched: log the cpu utilization at rq Alex Shi
2013-01-10 11:40   ` Morten Rasmussen
2013-01-11  3:30     ` Alex Shi
2013-01-14 13:59       ` Morten Rasmussen
2013-01-16  5:53         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 16/22] sched: add power aware scheduling in fork/exec/wake Alex Shi
2013-01-10 15:01   ` Morten Rasmussen
2013-01-11  7:08     ` Alex Shi
2013-01-14 16:09       ` Morten Rasmussen
2013-01-16  6:02         ` Alex Shi
2013-01-16 14:27           ` Morten Rasmussen
2013-01-17  5:47             ` Namhyung Kim
2013-01-18 13:41               ` Alex Shi
2013-01-14  7:03   ` Namhyung Kim
2013-01-14  8:30     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing Alex Shi
2013-01-10 17:17   ` Morten Rasmussen
2013-01-11  3:47     ` Alex Shi
2013-01-14  7:13       ` Namhyung Kim
2013-01-16  6:11         ` Alex Shi
2013-01-16 12:52           ` Namhyung Kim
2013-01-14 17:00       ` Morten Rasmussen
2013-01-16  7:32         ` Alex Shi
2013-01-16 15:08           ` Morten Rasmussen
2013-01-18 14:06             ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 18/22] sched: add power/performance balance allowed flag Alex Shi
2013-01-05  8:37 ` [PATCH v3 19/22] sched: pull all tasks from source group Alex Shi
2013-01-05  8:37 ` [PATCH v3 20/22] sched: don't care if the local group has capacity Alex Shi
2013-01-05  8:37 ` [PATCH v3 21/22] sched: power aware load balance, Alex Shi
2013-01-05  8:37 ` [PATCH v3 22/22] sched: lazy powersaving balance Alex Shi
2013-01-14  8:39   ` Namhyung Kim
2013-01-14  8:45     ` Alex Shi
2013-01-09 17:16 ` [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Morten Rasmussen
2013-01-10  3:49   ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F659AD.70405@intel.com \
    --to=alex.shi@intel.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=efault@gmx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.