linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, morten.rasmussen@arm.com,
	mingo@kernel.org, george.mccollister@gmail.com,
	ktkhai@parallels.com, Mel Gorman <mgorman@suse.de>,
	"Vinod, Chegu" <chegu_vinod@hp.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>
Subject: Re: [PATCH] sched: wake up task on prev_cpu if not in SD_WAKE_AFFINE domain with cpu
Date: Tue, 13 May 2014 10:08:20 -0400	[thread overview]
Message-ID: <53722754.6040102@redhat.com> (raw)
In-Reply-To: <1399694090.5146.13.camel@marge.simpson.net>

On 05/09/2014 11:54 PM, Mike Galbraith wrote:
> On Fri, 2014-05-09 at 14:16 -0400, Rik van Riel wrote:
> 
>> That leaves the big question: do we want to fall back to
>> prev_cpu if it is not idle, and it has an idle sibling,
>> or would it be better to find an idle sibling of prev_cpu
>> when we wake up a task?
> 
> Yes.  If there was A correct answer, this stuff would be a lot easier.

OK, after doing some other NUMA stuff, and then looking at the scheduler
again with a fresh mind, I have drawn some more conclusions about what
the scheduler does, and how it breaks NUMA locality :)

1) If the node_distance between nodes on a NUMA system is
   <= RECLAIM_DISTANCE, we will call select_idle_sibling for
   a wakeup of a previously existing task (SD_BALANCE_WAKE)

2) If the node distance exceeds RECLAIM_DISTANCE, we will
   wake up a task on prev_cpu, even if it is not currently
   idle

   This behaviour only happens on certain large NUMA systems,
   and is different from the behaviour on small systems.
   I suspect we will want to call select_idle_sibling with
   prev_cpu in case target and prev_cpu are not in the same
   SD_WAKE_AFFINE domain.

3) If wake_wide is false, we call select_idle_sibling with
   the CPU number of the code that is waking up the task

4) If wake_wide is true, we call select_idle_sibling with
   the CPU number the task was previously running on (prev_cpu)

   In effect, the "wake task on waking task's CPU" behaviour
   is the default, regardless of how frequently a task wakes up
   its wakee, and regardless of impact on NUMA locality.

   This may need to be changed.

5) select_idle_sibling will place the task on (3) or (4) only
   if the CPU is actually idle. If task A communicates with task
   B through a pipe or a socket, and does a sync wakeup, task
   B will never be placed on task A's CPU (not idle yet), and it
   will only be placed on its own previous CPU if it is currently
   idle.

6) If neither CPU is idle, select_idle_sibling will walk all the
   CPUs in the SD_SHARE_PKG_RESOURCES SD of the target. This looks
   correct to me, though it could result in more work by the load
   balancing code later on, since it does not take load into account
   at all. It is unclear if this needs any changes.

Am I overlooking anything?

What benchmarks should I run to test any changes I make?

Are there particular system types people want me to run tests with?

-- 
All rights reversed

  reply	other threads:[~2014-05-13 14:09 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02  4:42 [PATCH RFC/TEST] sched: make sync affine wakeups work Rik van Riel
2014-05-02  5:32 ` Mike Galbraith
2014-05-02  5:41   ` Mike Galbraith
2014-05-02  5:58   ` Mike Galbraith
2014-05-02  6:08     ` Rik van Riel
2014-05-02  6:36       ` Mike Galbraith
2014-05-02  6:51         ` Mike Galbraith
2014-05-02  6:13 ` Mike Galbraith
2014-05-02  6:30   ` Rik van Riel
2014-05-02  7:37     ` Mike Galbraith
2014-05-02 10:56       ` Rik van Riel
2014-05-02 11:27         ` Mike Galbraith
2014-05-02 12:51           ` Mike Galbraith
     [not found]           ` <5363B793.9010208@redhat.com>
2014-05-06 11:54             ` Peter Zijlstra
2014-05-06 20:19               ` Rik van Riel
2014-05-06 20:39                 ` Peter Zijlstra
2014-05-06 23:46                   ` Rik van Riel
2014-05-09  2:20                   ` Rik van Riel
2014-05-09  5:27                     ` [PATCH] sched: wake up task on prev_cpu if not in SD_WAKE_AFFINE domain with cpu Rik van Riel
2014-05-09  6:04                       ` [PATCH] sched: clean up select_task_rq_fair conditionals and indentation Rik van Riel
2014-05-09  7:34                       ` [PATCH] sched: wake up task on prev_cpu if not in SD_WAKE_AFFINE domain with cpu Mike Galbraith
2014-05-09 14:22                         ` Rik van Riel
2014-05-09 15:24                           ` Mike Galbraith
2014-05-09 15:24                             ` Rik van Riel
2014-05-09 17:55                               ` Mike Galbraith
2014-05-09 18:16                                 ` Rik van Riel
2014-05-10  3:54                                   ` Mike Galbraith
2014-05-13 14:08                                     ` Rik van Riel [this message]
2014-05-14  4:08                                       ` Mike Galbraith
2014-05-14 15:40                                         ` [PATCH] sched: call select_idle_sibling when not affine_sd Rik van Riel
2014-05-14 15:45                                           ` Peter Zijlstra
2014-05-19 13:08                                           ` [tip:sched/core] " tip-bot for Rik van Riel
2014-05-22 12:27                                           ` [tip:sched/core] sched: Call select_idle_sibling() " tip-bot for Rik van Riel
2014-05-04 11:44     ` [PATCH RFC/TEST] sched: make sync affine wakeups work Preeti Murthy
2014-05-04 12:04       ` Mike Galbraith
2014-05-05  4:38         ` Preeti U Murthy
2014-05-04 12:41       ` Rik van Riel
2014-05-05  4:50         ` Preeti U Murthy
2014-05-05  6:43           ` Preeti U Murthy
2014-05-05 11:28           ` Rik van Riel
2014-05-06 13:26           ` Peter Zijlstra
2014-05-06 13:25         ` Peter Zijlstra
2014-05-06 20:20           ` Rik van Riel
2014-05-06 20:41             ` Peter Zijlstra
2014-05-07 12:17               ` Ingo Molnar
2014-05-06 11:56       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53722754.6040102@redhat.com \
    --to=riel@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=george.mccollister@gmail.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).