public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
To: Gerd Hoffmann <kraxel@suse.de>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] scheduler issue & patch
Date: Mon, 12 Jun 2006 10:28:58 -0700	[thread overview]
Message-ID: <20060612102847.A5687@unix-os.sc.intel.com> (raw)
In-Reply-To: <448D88A2.1060002@suse.de>; from kraxel@suse.de on Mon, Jun 12, 2006 at 05:30:42PM +0200

On Mon, Jun 12, 2006 at 05:30:42PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> I'm looking into a scheduler issue with a NUMA box and scheduling
> domains.  The machine is a dual-core opteron with with two nodes, i.e.
> four cpus.  cpu0+1 build node0, cpu2+3 build node1.
> 
> Now I have an application (benchmark) with two threads which performs
> best when the two threads are running on different nodes (probably
> because the cpus on each node share the L2 cache).  The scheduler tends
> to keep threads on the local node though, wihch probably makes sense on
> most cases because local memory is faster.
> 
> Ok, we have tools to give hints to the scheduler (taskset, numactl).
> The problem is it doesn't work well.  I can ask the scheduler to use
> cpu1 (node0) and cpu3 (node1) only (via "taskset 0x0a").  But the
> scheduler very often schedules both threads on the same cpu :-(
> 
> I think the reason is that the scheduler always checks the complete cpu
> groups when calculation the group load, without looking at
> task->cpus_allowed.  So we have the effect that the scheduler walks down
> the scheduler domain tree, looks at the group for node0, looks at both
> cpu0 and cpu1, finds node0 being not overloaded due to cpu0 being idle
> and decides to keep the thread on the local node.  Next it walks down
> the tree and finds it isn't allowed to use the idle cpu0.  So both
> threads get scheduled to cpu1.  Oops.

I don't think it is the problem with sched_balance_self(). sched_balance_self()
probably is doing the right thing based on the load that is present at the
time of fork/exec. Once the node-1 becomes idle, we expect the two threads
on node-0 cpu-1 to get distributed between the two nodes.

Perhaps the real issue is how cpu_power is calculated for node domain
on these systems. Because of the shared resources between the cpus in a node,
cpu_power for a group in node domain should be < 2 * SCHED_LOAD_SCALE..

Once this is the case, find_busiest_group() should detect the imbalance and
move one of the threads from cpu-1(node-0) to cpu-3(node-1)

> The patch attached takes the sledgehammer approach to fix it:  In case
> we have a non-default cpumask in task->cpus_allowed the scheduler
> ignores all the fancy scheduling domains and simply spreads the load
> equally over the cpus allowed by task->cpus_allowed.  Not exactly
> elegant, but works.  Not each time, but very often.
> 
> Comments?  Ideas how to solve this better?  I've also tried to play with
> the group load calculation, but it didn't work well.  I'm kida lost in
> all those scheduler tuning knobs ...

In my opinion, this patch is not the correct fix for the issue.

thanks,
suresh

  reply	other threads:[~2006-06-12 17:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-12 15:30 [RFC] scheduler issue & patch Gerd Hoffmann
2006-06-12 17:28 ` Siddha, Suresh B [this message]
2006-06-12 17:52   ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060612102847.A5687@unix-os.sc.intel.com \
    --to=suresh.b.siddha@intel.com \
    --cc=kraxel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox