From: Alex Shi <alex.shi@intel.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: npiggin@kernel.dk, mingo@redhat.com, peterz@infradead.org,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH 02/10] sched: fix find_idlest_group mess logical
Date: Sat, 08 Dec 2012 20:12:02 +0800 [thread overview]
Message-ID: <50C32E92.1060906@intel.com> (raw)
In-Reply-To: <CAFTL4hyVx6otU=PBsXLbd3dqsdem9CXPaxetp_zz+8UhqsS=hw@mail.gmail.com>
On 12/07/2012 04:33 PM, Frederic Weisbecker wrote:
> 2012/12/7 Alex Shi <alex.shi@intel.com>:
>> On 12/07/2012 08:56 AM, Frederic Weisbecker wrote:
>>> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>>>> There is 4 situations in the function:
>>>> 1, no task allowed group;
>>>> so min_load = ULONG_MAX, this_load = 0, idlest = NULL
>>>> 2, only local group task allowed;
>>>> so min_load = ULONG_MAX, this_load assigned, idlest = NULL
>>>> 3, only non-local task group allowed;
>>>> so min_load assigned, this_load = 0, idlest != NULL
>>>> 4, local group + another group are task allowed.
>>>> so min_load assigned, this_load assigned, idlest != NULL
>>>>
>>>> Current logical will return NULL in first 3 kinds of scenarios.
>>>> And still return NULL, if idlest group is heavier then the
>>>> local group in the 4th situation.
>>>>
>>>> Actually, I thought groups in situation 2,3 are also eligible to host
>>>> the task. And in 4th situation, agree to bias toward local group.
>>>> So, has this patch.
>>>
>>> The way I understand the loop that use this in select_task_rq_fair() is:
>>>
>>> a) start from the highest domain level we are allowed to run to
>>> migrate the task in
>>> b) from that top level domain, find the idlest group. If the idlest
>>> group contains current CPU, zoom in the child domain and repeat b). If
>>> the idlest group doesn't contain the current CPU, pick the idlest CPU
>>> from that group.
>>> c) In the end if we found no idler target than current CPU, then take it.
>>>
>>> So if you also return a group that contains current CPU from
>>> find_idlest_group(), you don't recursively zoom in the child domain
>>> anymore. find_idlest_cpu() will fix that for you but it may come with
>>> some cost because now it iterates through every CPUs, or may be half
>>> of them.
>>
>> Not exactly, the old logical won't select cpu from group of situation 2
>> and 3. That is wrong. and may cause the task keep stay on prev_cpu even
>> there are still other idler and allowed cpu exist.
>
> For situation 2 I don't understand the issue. If current CPU belong to
> idlest group we want to zoom in our lookup until we find something an
> idler group than the current CPU's? If we eventually don't find it,
> then we fallback to current CPU, don't we?
fallback to current CPU is not the best choice here. :)
The meaning to release situation 2 is that this_cpu may not the idlest
cpu even in local group. there maybe other idlers in the local group.
But the old logical will refuse to select idlest cpu from the local
group, just there is no other group eligible. that is the problem.
>
> I just have a doubt to express. How does the final leaf child domain
> look like? Is it made of current CPU only or can it contain other
> siblings? In the first case we are fine. In the second one, if this
> domain is made of only one group of several CPUs, we are skipping the
> find_idlest_cpu() call for that group and choose this_cpu by default.
> Which is probably suboptimized?
In most of time, the domain is not leaf child domain. and even with leaf
child and single group domain, find_idlest_cpu will return quickly, that
should not cause much trouble.
>
> Concerning situation 3, if this_cpu is not a CPU allowed by the task,
> we may indeed have an issue because find_idlest_group() doesn't seem
> to be selecting non-local groups in this case.
thanks!
But your current fix
> still breaks the recursive find_idlest_group() on other cases and may
> not scale with big number of CPUs.
>
I don't understand recursive fig can scale with big number CPU. :)
Actually, this patch set just show 10+% performance gain on hackbench
with big machines, while just 0~2% performance gain on 2 sockets machine.
--
Thanks
Alex
next prev parent reply other threads:[~2012-12-08 12:12 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
2012-12-06 17:50 ` Frederic Weisbecker
2012-12-07 0:31 ` Alex Shi
2012-12-07 1:02 ` Frederic Weisbecker
2012-12-07 1:22 ` Alex Shi
2012-12-03 13:54 ` [PATCH 02/10] sched: fix find_idlest_group mess logical Alex Shi
2012-12-07 0:56 ` Frederic Weisbecker
2012-12-07 1:32 ` Alex Shi
2012-12-07 8:33 ` Frederic Weisbecker
2012-12-08 12:12 ` Alex Shi [this message]
2012-12-03 13:54 ` [PATCH 03/10] sched: don't need go to smaller sched domain Alex Shi
2012-12-03 13:54 ` [PATCH 04/10] sched: remove domain iterations in fork/exec/wake Alex Shi
2012-12-05 7:09 ` [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
2012-12-07 0:33 ` Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C32E92.1060906@intel.com \
--to=alex.shi@intel.com \
--cc=akpm@linux-foundation.org \
--cc=efault@gmx.de \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).