All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Anil Madhavapeddy <anil@recoil.org>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>,
	Daniel De Graaf <dgdegra@tycho.nsa.gov>,
	Matt Wilson <msw@amazon.com>
Subject: Re: [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
Date: Fri, 21 Dec 2012 14:29:09 +0000	[thread overview]
Message-ID: <50D47235.4090106@eu.citrix.com> (raw)
In-Reply-To: <CAAWQectVEihrayJj5n4SPGqA0QJSiC7s2x_oDW=KHyxukWpMSA@mail.gmail.com>

On 20/12/12 18:18, Dario Faggioli wrote:
> On Thu, Dec 20, 2012 at 5:48 PM, George Dunlap
> <george.dunlap@eu.citrix.com> wrote:
>> And in any case, looking at the caller of csched_load_balance(), it
>> explicitly says to steal work if the next thing on the runqueue of cpu has a
>> priority of TS_OVER.  That was chosen for a reason -- if you want to change
>> that, you should change it there at the top (and make a justification for
>> doing so), not deeply nested in a function like this.
>>
>> Or am I completely missing something?
>>
> No, you're right. Trying to solve a nasty issue I was seeing, I overlooked I was
> changing the underlying logic until that point... Thanks!
>
> What I want to avoid is the following: a vcpu wakes-up on the busy pcpu Y. As
> a consequence, the idle pcpu X is tickled. Then, for any unrelated reason, pcpu
> Z reschedules and, as it would go idle too, it looks around for any
> vcpu to steal,
> finds one in Y's runqueue and grabs it. Afterward, when X gets the IPI and
> schedules, it just does not find anyone to run and goes back idling.
>
> Now, suppose the vcpu has X, but *not* Z, in its node-affinity (while
> it has a full
> vcpu-affinity, i.e., can run everywhere). In this case, a vcpu that
> could have run on
> a pcpu in its node-affinity, executes outside from it. That happens because,
> the NODE_BALANCE_STEP in csched_load_balance(), when called by Z, won't
> find anything suitable to steal (provided there actually isn't any
> vcpu waiting in
> any runqueue with node-affinity with Z), while the CPU_BALANCE_STEP will
> find our vcpu. :-(
>
> So, what I wanted is something that could tell me whether the pcpu which is
> stealing work is the one that has actually been tickled to do so. I
> was then using
> the pcpu idleness as a (cheap and easy to check) indication of that,
> but I now see
> this is having side effects I in the first place did not want to cause.
>
> Sorry for that, I probably spent so much time buried, as you where
> saying, in the
> various nested loops and calls, that I lost the context a little bit! :-P

OK, that makes sense -- I figured it was something like that.  Don't 
feel too bad about missing that connection -- we're all fairly blind to 
our own code, and I only caught it because I was trying to figure out 
what was going on.  That's why we do patch review. :-)

Honestly, the whole "steal work" idea seemed a bit backwards to begin 
with, but now that we're not just dealing with "possible" and "not 
possible", but with "better" and "worse", the work-stealing method of 
load balancing sort of falls down.

It does make sense to do the load-balancing work on idle cpus rather 
than already-busy cpus; but I wonder if what should happen instead is 
that before idling, a pcpu chooses a "busy" pcpu and does a global load 
balancing for it -- i.e., pcpu 1 will look at pcpu 5's runqueue, and 
consider moving away the vcpus on the runqueue not just to itself but to 
any available cpu.

That way, in your example, Z might wake up, look at X's runqueue, and 
say, "This would probably run well on Y -- I'll migrate it there."

But that's kind of a half-baked idea at this point.

> Ok, I think the problem I was describing is real, and I've seen it happening and
> causing performances degradation. However, as I think a good solution
> is going to
> be more complex than I thought, I'd better repost without this
> function and deal with
> it in a future separate patch (after having figured out the best way
> of doing so). Is
> that fine with you?

Yes, that's fine.  Thanks, Dario.

  -George

  reply	other threads:[~2012-12-21 14:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19 19:07 [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-12-20  9:17   ` Jan Beulich
2012-12-20  9:35     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 02 of 10 v2] xen, libxc: introduce node maps and masks Dario Faggioli
2012-12-20  9:18   ` Jan Beulich
2012-12-20  9:55     ` Dario Faggioli
2012-12-20 14:33     ` George Dunlap
2012-12-20 14:52       ` Jan Beulich
2012-12-20 15:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity Dario Faggioli
2012-12-20  6:44   ` Juergen Gross
2012-12-20  8:16     ` Dario Faggioli
2012-12-20  8:25       ` Juergen Gross
2012-12-20  8:33         ` Dario Faggioli
2012-12-20  8:39           ` Juergen Gross
2012-12-20  8:58             ` Dario Faggioli
2012-12-20 15:28             ` George Dunlap
2012-12-20 16:00               ` Dario Faggioli
2012-12-20  9:22           ` Jan Beulich
2012-12-20 15:56   ` George Dunlap
2012-12-20 17:12     ` Dario Faggioli
2012-12-20 16:48   ` George Dunlap
2012-12-20 18:18     ` Dario Faggioli
2012-12-21 14:29       ` George Dunlap [this message]
2012-12-21 16:07         ` Dario Faggioli
2012-12-20 20:21   ` George Dunlap
2012-12-21  0:18     ` Dario Faggioli
2012-12-21 14:56       ` George Dunlap
2012-12-21 16:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 04 of 10 v2] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-12-21 15:17   ` George Dunlap
2012-12-21 16:17     ` Dario Faggioli
2013-01-03 16:05     ` Daniel De Graaf
2012-12-19 19:07 ` [PATCH 05 of 10 v2] libxc: " Dario Faggioli
2012-12-21 15:19   ` George Dunlap
2012-12-21 16:27     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 06 of 10 v2] libxl: " Dario Faggioli
2012-12-21 15:30   ` George Dunlap
2012-12-21 16:18     ` Dario Faggioli
2012-12-21 17:02       ` Ian Jackson
2012-12-21 17:09         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Dario Faggioli
2012-12-20  8:41   ` Ian Campbell
2012-12-20  9:24     ` Dario Faggioli
2012-12-21 16:00   ` George Dunlap
2012-12-21 16:23     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 08 of 10 v2] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-12-21 16:22   ` George Dunlap
2012-12-19 19:07 ` [PATCH 09 of 10 v2] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-12-21 16:34   ` George Dunlap
2012-12-21 16:54     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 10 of 10 v2] docs: rearrange and update NUMA placement documentation Dario Faggioli
2012-12-19 23:16 ` [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2013-01-11 12:19 ` Ian Campbell
2013-01-11 13:57   ` Dario Faggioli
2013-01-11 14:09     ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D47235.4090106@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Marcus.Granado@eu.citrix.com \
    --cc=anil@recoil.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=dario.faggioli@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=msw@amazon.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.