linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@intel.com>
To: li guang <lig.fnst@cn.fujitsu.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"bp@alien8.de" <bp@alien8.de>, "pjt@google.com" <pjt@google.com>,
	"namhyung@kernel.org" <namhyung@kernel.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake
Date: Fri, 11 Jan 2013 22:56:35 +0800	[thread overview]
Message-ID: <50F02823.1070909@intel.com> (raw)
In-Reply-To: <1357891316.2860.6.camel@liguang.fnst.cn.fujitsu.com>

On 01/11/2013 04:01 PM, li guang wrote:
> 在 2013-01-11五的 10:26 +0530,Preeti U Murthy写道:
>> Hi Morten,Alex
>>
>> On 01/09/2013 11:51 PM, Morten Rasmussen wrote:
>>> On Sat, Jan 05, 2013 at 08:37:34AM +0000, Alex Shi wrote:
>>>> Guess the search cpu from bottom to up in domain tree come from
>>>> commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is
>>>> balancing over tasks on all level domains.
>>>>
>>>> This balancing cost much if there has many domain/groups in a large
>>>> system. And force spreading task among different domains may cause
>>>> performance issue due to bad locality.
>>>>
>>>> If we remove this code, we will get quick fork/exec/wake, plus better
>>>> balancing among whole system, that also reduce migrations in future
>>>> load balancing.
>>>>
>>>> This patch increases 10+% performance of hackbench on my 4 sockets
>>>> NHM and SNB machines.
>>>>
>>>> Signed-off-by: Alex Shi <alex.shi@intel.com>
>>>> ---
>>>>  kernel/sched/fair.c | 20 +-------------------
>>>>  1 file changed, 1 insertion(+), 19 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index ecfbf8e..895a3f4 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -3364,15 +3364,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>>>>  		goto unlock;
>>>>  	}
>>>>  
>>>> -	while (sd) {
>>>> +	if (sd) {
>>>>  		int load_idx = sd->forkexec_idx;
>>>>  		struct sched_group *group;
>>>> -		int weight;
>>>> -
>>>> -		if (!(sd->flags & sd_flag)) {
>>>> -			sd = sd->child;
>>>> -			continue;
>>>> -		}
>>>>  
>>>>  		if (sd_flag & SD_BALANCE_WAKE)
>>>>  			load_idx = sd->wake_idx;
>>>> @@ -3382,18 +3376,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>>>>  			goto unlock;
>>>>  
>>>>  		new_cpu = find_idlest_cpu(group, p, cpu);
>>>> -
>>>> -		/* Now try balancing at a lower domain level of new_cpu */
>>>> -		cpu = new_cpu;
>>>> -		weight = sd->span_weight;
>>>> -		sd = NULL;
>>>> -		for_each_domain(cpu, tmp) {
>>>> -			if (weight <= tmp->span_weight)
>>>> -				break;
>>>> -			if (tmp->flags & sd_flag)
>>>> -				sd = tmp;
>>>> -		}
>>>> -		/* while loop will break here if sd == NULL */
>>>
>>> I agree that this should be a major optimization. I just can't figure
>>> out why the existing recursive search for an idle cpu switches to the
>>> new cpu near the end and then starts a search for an idle cpu in the new
>>> cpu's domain. Is this to handle some exotic sched domain configurations?
>>> If so, they probably wouldn't work with your optimizations.
>>
>> Let me explain my understanding of why the recursive search is the way
>> it is.
>>
>>  _________________________  sd0
>> |                         |
>> |  ___sd1__   ___sd2__    |
>> | |        | |        |   |
>> | | sgx    | |  sga   |   |
>> | | sgy    | |  sgb   |   |
>> | |________| |________|   |
>> |_________________________|
>>
>> What the current recursive search is doing is (assuming we start with
>> sd0-the top level sched domain whose flags are rightly set). we find
>> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu.
>>
>> We could have ideally stopped the search here.But the problem with this
>> is that there is a possibility that sgx is more loaded than sgy; meaning
>> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and
>> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten
>> idle and load balancing has not come to its rescue yet.According to the
>> search above, cpux1 is idle,but is *not the right candidate for
>> scheduling forked task,it is the right candidate for relieving the load
>> from cpux2* due to cache locality etc.
> 
> This corner case may occur after "[PATCH v3 03/22] sched: fix
> find_idlest_group mess logical" brought in the local sched_group bias,
> and assume balancing runs on cpux2.
> ideally,  find_idlest_group should find the real idlest(this case: sgy),
> then, this patch is reasonable.
> 

Sure. but seems it is a bit hard to go down the idlest group.

and the old logical is real cost too much, on my 2 socket NHM/SNB
server, hackbench can increase 2~5% performance. and no clean
performance on kbuild/aim7 etc.

-- 
Thanks
    Alex

  reply	other threads:[~2013-01-11 14:56 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-05  8:37 [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Alex Shi
2013-01-05  8:37 ` [PATCH v3 01/22] sched: set SD_PREFER_SIBLING on MC domain to reduce a domain level Alex Shi
2013-01-05  8:37 ` [PATCH v3 02/22] sched: select_task_rq_fair clean up Alex Shi
2013-01-11  4:57   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 03/22] sched: fix find_idlest_group mess logical Alex Shi
2013-01-11  4:59   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 04/22] sched: don't need go to smaller sched domain Alex Shi
2013-01-09 17:38   ` Morten Rasmussen
2013-01-10  3:16     ` Mike Galbraith
2013-01-11  5:02   ` Preeti U Murthy
2013-01-05  8:37 ` [PATCH v3 05/22] sched: remove domain iterations in fork/exec/wake Alex Shi
2013-01-09 18:21   ` Morten Rasmussen
2013-01-11  2:46     ` Alex Shi
2013-01-11 10:07       ` Morten Rasmussen
2013-01-11 14:50         ` Alex Shi
2013-01-14  8:55         ` li guang
2013-01-14  9:18           ` Alex Shi
2013-01-11  4:56     ` Preeti U Murthy
2013-01-11  8:01       ` li guang
2013-01-11 14:56         ` Alex Shi [this message]
2013-01-14  9:03           ` li guang
2013-01-15  2:34             ` Alex Shi
2013-01-16  1:54               ` li guang
2013-01-11 10:54       ` Morten Rasmussen
2013-01-16  5:43       ` Alex Shi
2013-01-16  7:41         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 06/22] sched: load tracking bug fix Alex Shi
2013-01-05  8:37 ` [PATCH v3 07/22] sched: set initial load avg of new forked task Alex Shi
2013-01-11  5:10   ` Preeti U Murthy
2013-01-11  5:44     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 08/22] sched: update cpu load after task_tick Alex Shi
2013-01-05  8:37 ` [PATCH v3 09/22] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-01-05  8:56   ` Alex Shi
2013-01-06  7:54     ` Alex Shi
2013-01-06 18:31       ` Linus Torvalds
2013-01-07  7:00         ` Preeti U Murthy
2013-01-08 14:27         ` Alex Shi
2013-01-11  6:31         ` Alex Shi
2013-01-21 14:47           ` Alex Shi
2013-01-22  3:20             ` Alex Shi
2013-01-22  6:55               ` Mike Galbraith
2013-01-22  7:50                 ` Alex Shi
2013-01-22  9:52                   ` Mike Galbraith
2013-01-23  0:36                     ` Alex Shi
2013-01-23  1:47                       ` Mike Galbraith
2013-01-23  2:01                         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 10/22] sched: consider runnable load average in move_tasks Alex Shi
2013-01-05  8:37 ` [PATCH v3 11/22] sched: consider runnable load average in effective_load Alex Shi
2013-01-10 11:28   ` Morten Rasmussen
2013-01-11  3:26     ` Alex Shi
2013-01-14 12:01       ` Morten Rasmussen
2013-01-16  5:30         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 12/22] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-01-05  8:37 ` [PATCH v3 13/22] sched: add sched_policy in kernel Alex Shi
2013-01-05  8:37 ` [PATCH v3 14/22] sched: add sched_policy and it's sysfs interface Alex Shi
2013-01-14  6:53   ` Namhyung Kim
2013-01-14  8:11     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 15/22] sched: log the cpu utilization at rq Alex Shi
2013-01-10 11:40   ` Morten Rasmussen
2013-01-11  3:30     ` Alex Shi
2013-01-14 13:59       ` Morten Rasmussen
2013-01-16  5:53         ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 16/22] sched: add power aware scheduling in fork/exec/wake Alex Shi
2013-01-10 15:01   ` Morten Rasmussen
2013-01-11  7:08     ` Alex Shi
2013-01-14 16:09       ` Morten Rasmussen
2013-01-16  6:02         ` Alex Shi
2013-01-16 14:27           ` Morten Rasmussen
2013-01-17  5:47             ` Namhyung Kim
2013-01-18 13:41               ` Alex Shi
2013-01-14  7:03   ` Namhyung Kim
2013-01-14  8:30     ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing Alex Shi
2013-01-10 17:17   ` Morten Rasmussen
2013-01-11  3:47     ` Alex Shi
2013-01-14  7:13       ` Namhyung Kim
2013-01-16  6:11         ` Alex Shi
2013-01-16 12:52           ` Namhyung Kim
2013-01-14 17:00       ` Morten Rasmussen
2013-01-16  7:32         ` Alex Shi
2013-01-16 15:08           ` Morten Rasmussen
2013-01-18 14:06             ` Alex Shi
2013-01-05  8:37 ` [PATCH v3 18/22] sched: add power/performance balance allowed flag Alex Shi
2013-01-05  8:37 ` [PATCH v3 19/22] sched: pull all tasks from source group Alex Shi
2013-01-05  8:37 ` [PATCH v3 20/22] sched: don't care if the local group has capacity Alex Shi
2013-01-05  8:37 ` [PATCH v3 21/22] sched: power aware load balance, Alex Shi
2013-01-05  8:37 ` [PATCH v3 22/22] sched: lazy powersaving balance Alex Shi
2013-01-14  8:39   ` Namhyung Kim
2013-01-14  8:45     ` Alex Shi
2013-01-09 17:16 ` [PATCH V3 0/22] sched: simplified fork, enable load average into LB and power awareness scheduling Morten Rasmussen
2013-01-10  3:49   ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F02823.1070909@intel.com \
    --to=alex.shi@intel.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=efault@gmx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=lig.fnst@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).