From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
To: Gerd Hoffmann <kraxel@suse.de>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] scheduler issue & patch
Date: Mon, 12 Jun 2006 10:28:58 -0700 [thread overview]
Message-ID: <20060612102847.A5687@unix-os.sc.intel.com> (raw)
In-Reply-To: <448D88A2.1060002@suse.de>; from kraxel@suse.de on Mon, Jun 12, 2006 at 05:30:42PM +0200
On Mon, Jun 12, 2006 at 05:30:42PM +0200, Gerd Hoffmann wrote:
> Hi,
>
> I'm looking into a scheduler issue with a NUMA box and scheduling
> domains. The machine is a dual-core opteron with with two nodes, i.e.
> four cpus. cpu0+1 build node0, cpu2+3 build node1.
>
> Now I have an application (benchmark) with two threads which performs
> best when the two threads are running on different nodes (probably
> because the cpus on each node share the L2 cache). The scheduler tends
> to keep threads on the local node though, wihch probably makes sense on
> most cases because local memory is faster.
>
> Ok, we have tools to give hints to the scheduler (taskset, numactl).
> The problem is it doesn't work well. I can ask the scheduler to use
> cpu1 (node0) and cpu3 (node1) only (via "taskset 0x0a"). But the
> scheduler very often schedules both threads on the same cpu :-(
>
> I think the reason is that the scheduler always checks the complete cpu
> groups when calculation the group load, without looking at
> task->cpus_allowed. So we have the effect that the scheduler walks down
> the scheduler domain tree, looks at the group for node0, looks at both
> cpu0 and cpu1, finds node0 being not overloaded due to cpu0 being idle
> and decides to keep the thread on the local node. Next it walks down
> the tree and finds it isn't allowed to use the idle cpu0. So both
> threads get scheduled to cpu1. Oops.
I don't think it is the problem with sched_balance_self(). sched_balance_self()
probably is doing the right thing based on the load that is present at the
time of fork/exec. Once the node-1 becomes idle, we expect the two threads
on node-0 cpu-1 to get distributed between the two nodes.
Perhaps the real issue is how cpu_power is calculated for node domain
on these systems. Because of the shared resources between the cpus in a node,
cpu_power for a group in node domain should be < 2 * SCHED_LOAD_SCALE..
Once this is the case, find_busiest_group() should detect the imbalance and
move one of the threads from cpu-1(node-0) to cpu-3(node-1)
> The patch attached takes the sledgehammer approach to fix it: In case
> we have a non-default cpumask in task->cpus_allowed the scheduler
> ignores all the fancy scheduling domains and simply spreads the load
> equally over the cpus allowed by task->cpus_allowed. Not exactly
> elegant, but works. Not each time, but very often.
>
> Comments? Ideas how to solve this better? I've also tried to play with
> the group load calculation, but it didn't work well. I'm kida lost in
> all those scheduler tuning knobs ...
In my opinion, this patch is not the correct fix for the issue.
thanks,
suresh
next prev parent reply other threads:[~2006-06-12 17:34 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-12 15:30 [RFC] scheduler issue & patch Gerd Hoffmann
2006-06-12 17:28 ` Siddha, Suresh B [this message]
2006-06-12 17:52 ` Gerd Hoffmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060612102847.A5687@unix-os.sc.intel.com \
--to=suresh.b.siddha@intel.com \
--cc=kraxel@suse.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox