From: Darren Hart <dvhltc@us.ibm.com>
To: "lkml, " <linux-kernel@vger.kernel.org>
Cc: "Piggin, Nick" <piggin@cyberone.com.au>,
"Bligh, Martin" <mjbligh@us.ibm.com>,
"Dobson, Matt" <colpatch@us.ibm.com>
Subject: sched_domains SD_BALANCE_FORK and sched_balance_self
Date: Fri, 05 Aug 2005 16:29:45 -0700 [thread overview]
Message-ID: <42F3F669.2080101@us.ibm.com> (raw)
First off, apologies for not reviewing this code at 2.6.12-mm2, I was
tied up with other things. I have some concerns as to the intent vs.
actual implementation of SD_BALANCE_FORK and the sched_balance_fork()
routine.
ARCHS=i386,x86_64,ia64
First, iirc SD_NODE_INIT initializes the sched_domain that contains all
the cpus in the system in node groups (with the exception of ia64 and
the allnodes domain). Correct?
SD_NODE_INIT for $ARCHS contains SD_BALANCE_FORK, and no other SD_*_INIT
routines do. This seems strange to me as it would seem more appropriate
to balance within a node on fork as to not have to access the duplicated
mm across nodes. If we are going to use SD_BALANCE_FORK, wouldn't it
make sense to push it down the sched_domain hierarchy to the SD_CPU_INIT
level?
And now to sched_balance_self(). As I understand it the purpose of this
method is to choose the "best" cpu to start a task on. It seems to me
that the best CPU for a forked process would be an idle CPU on the same
node as the parent in order to stay close to it's memory. Failing this,
we may need to move to other nodes if they are idle enough to warrant
the move across node boundaries. Thoughts?
For the comments below, flag = SD_BALANCE_FORK.
static int sched_balance_self(int cpu, int flag)
{
struct task_struct *t = current;
struct sched_domain *tmp, *sd = NULL;
for_each_domain(cpu, tmp)
if (tmp->flags & flag)
sd = tmp;
This jumps to the highest level domain that supports SD_BALANCE_FORK
which are the domains created with SD_NODE_INIT, so we look at all the
CPUs on the system first rather than those local to the parent's node.
while (sd) {
cpumask_t span;
struct sched_group *group;
int new_cpu;
int weight;
span = sd->span;
group = find_idlest_group(sd, t, cpu);
if (!group)
goto nextlevel;
new_cpu = find_idlest_cpu(group, cpu);
if (new_cpu == -1 || new_cpu == cpu)
goto nextlevel;
/* Now try balancing at a lower domain level */
cpu = new_cpu;
nextlevel:
sd = NULL;
weight = cpus_weight(span);
for_each_domain(cpu, tmp) {
if (weight <= cpus_weight(tmp->span))
break;
if (tmp->flags & flag)
sd = tmp;
}
If I am reading it right, this for_each_domain will exit immediately if
jumped to via nextlevel and will only do any work if a new cpu is found
to run on (which is fair sense there is no need to keep looking if the
whole system doesn't have a better place for us to go). If a new cpu
_is_ assigned though, for_each_domain will start with the lowest level
domain - which always has the smallest cpus_weight doesn't it? If so,
won't the (weight <= cpu...) condition always equate to true, ending the
loop at the first domain? If so, then that last loop doesn't do
anything at all, ever. If I am misreading this fragment, could someone
please correct my thinking?
/* while loop will break here if sd == NULL */
}
return cpu;
}
So it seems to me that we should first look at the cpus on the local
domain for SD_BALANCE_FORK. SD_BALANCE_EXEC tasks however have a
minimal mm and could probably be put on whichever cpu/group is the most
idle, regardless of node. Thoughts?
Thanks,
--
Darren Hart
IBM Linux Technology Center
Linux Kernel Team
Phone: 503 578 3185
T/L: 775 3185
next reply other threads:[~2005-08-05 23:23 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-05 23:29 Darren Hart [this message]
2005-08-09 22:03 ` sched_domains SD_BALANCE_FORK and sched_balance_self Siddha, Suresh B
2005-08-09 22:19 ` Martin J. Bligh
2005-08-10 0:40 ` Siddha, Suresh B
2005-08-10 0:43 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42F3F669.2080101@us.ibm.com \
--to=dvhltc@us.ibm.com \
--cc=colpatch@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mjbligh@us.ibm.com \
--cc=piggin@cyberone.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox