From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Anil Madhavapeddy <anil@recoil.org>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>,
Daniel De Graaf <dgdegra@tycho.nsa.gov>,
Matt Wilson <msw@amazon.com>
Subject: Re: [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
Date: Fri, 21 Dec 2012 14:29:09 +0000 [thread overview]
Message-ID: <50D47235.4090106@eu.citrix.com> (raw)
In-Reply-To: <CAAWQectVEihrayJj5n4SPGqA0QJSiC7s2x_oDW=KHyxukWpMSA@mail.gmail.com>
On 20/12/12 18:18, Dario Faggioli wrote:
> On Thu, Dec 20, 2012 at 5:48 PM, George Dunlap
> <george.dunlap@eu.citrix.com> wrote:
>> And in any case, looking at the caller of csched_load_balance(), it
>> explicitly says to steal work if the next thing on the runqueue of cpu has a
>> priority of TS_OVER. That was chosen for a reason -- if you want to change
>> that, you should change it there at the top (and make a justification for
>> doing so), not deeply nested in a function like this.
>>
>> Or am I completely missing something?
>>
> No, you're right. Trying to solve a nasty issue I was seeing, I overlooked I was
> changing the underlying logic until that point... Thanks!
>
> What I want to avoid is the following: a vcpu wakes-up on the busy pcpu Y. As
> a consequence, the idle pcpu X is tickled. Then, for any unrelated reason, pcpu
> Z reschedules and, as it would go idle too, it looks around for any
> vcpu to steal,
> finds one in Y's runqueue and grabs it. Afterward, when X gets the IPI and
> schedules, it just does not find anyone to run and goes back idling.
>
> Now, suppose the vcpu has X, but *not* Z, in its node-affinity (while
> it has a full
> vcpu-affinity, i.e., can run everywhere). In this case, a vcpu that
> could have run on
> a pcpu in its node-affinity, executes outside from it. That happens because,
> the NODE_BALANCE_STEP in csched_load_balance(), when called by Z, won't
> find anything suitable to steal (provided there actually isn't any
> vcpu waiting in
> any runqueue with node-affinity with Z), while the CPU_BALANCE_STEP will
> find our vcpu. :-(
>
> So, what I wanted is something that could tell me whether the pcpu which is
> stealing work is the one that has actually been tickled to do so. I
> was then using
> the pcpu idleness as a (cheap and easy to check) indication of that,
> but I now see
> this is having side effects I in the first place did not want to cause.
>
> Sorry for that, I probably spent so much time buried, as you where
> saying, in the
> various nested loops and calls, that I lost the context a little bit! :-P
OK, that makes sense -- I figured it was something like that. Don't
feel too bad about missing that connection -- we're all fairly blind to
our own code, and I only caught it because I was trying to figure out
what was going on. That's why we do patch review. :-)
Honestly, the whole "steal work" idea seemed a bit backwards to begin
with, but now that we're not just dealing with "possible" and "not
possible", but with "better" and "worse", the work-stealing method of
load balancing sort of falls down.
It does make sense to do the load-balancing work on idle cpus rather
than already-busy cpus; but I wonder if what should happen instead is
that before idling, a pcpu chooses a "busy" pcpu and does a global load
balancing for it -- i.e., pcpu 1 will look at pcpu 5's runqueue, and
consider moving away the vcpus on the runqueue not just to itself but to
any available cpu.
That way, in your example, Z might wake up, look at X's runqueue, and
say, "This would probably run well on Y -- I'll migrate it there."
But that's kind of a half-baked idea at this point.
> Ok, I think the problem I was describing is real, and I've seen it happening and
> causing performances degradation. However, as I think a good solution
> is going to
> be more complex than I thought, I'd better repost without this
> function and deal with
> it in a future separate patch (after having figured out the best way
> of doing so). Is
> that fine with you?
Yes, that's fine. Thanks, Dario.
-George
next prev parent reply other threads:[~2012-12-21 14:29 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-19 19:07 [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-12-20 9:17 ` Jan Beulich
2012-12-20 9:35 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 02 of 10 v2] xen, libxc: introduce node maps and masks Dario Faggioli
2012-12-20 9:18 ` Jan Beulich
2012-12-20 9:55 ` Dario Faggioli
2012-12-20 14:33 ` George Dunlap
2012-12-20 14:52 ` Jan Beulich
2012-12-20 15:13 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity Dario Faggioli
2012-12-20 6:44 ` Juergen Gross
2012-12-20 8:16 ` Dario Faggioli
2012-12-20 8:25 ` Juergen Gross
2012-12-20 8:33 ` Dario Faggioli
2012-12-20 8:39 ` Juergen Gross
2012-12-20 8:58 ` Dario Faggioli
2012-12-20 15:28 ` George Dunlap
2012-12-20 16:00 ` Dario Faggioli
2012-12-20 9:22 ` Jan Beulich
2012-12-20 15:56 ` George Dunlap
2012-12-20 17:12 ` Dario Faggioli
2012-12-20 16:48 ` George Dunlap
2012-12-20 18:18 ` Dario Faggioli
2012-12-21 14:29 ` George Dunlap [this message]
2012-12-21 16:07 ` Dario Faggioli
2012-12-20 20:21 ` George Dunlap
2012-12-21 0:18 ` Dario Faggioli
2012-12-21 14:56 ` George Dunlap
2012-12-21 16:13 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 04 of 10 v2] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-12-21 15:17 ` George Dunlap
2012-12-21 16:17 ` Dario Faggioli
2013-01-03 16:05 ` Daniel De Graaf
2012-12-19 19:07 ` [PATCH 05 of 10 v2] libxc: " Dario Faggioli
2012-12-21 15:19 ` George Dunlap
2012-12-21 16:27 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 06 of 10 v2] libxl: " Dario Faggioli
2012-12-21 15:30 ` George Dunlap
2012-12-21 16:18 ` Dario Faggioli
2012-12-21 17:02 ` Ian Jackson
2012-12-21 17:09 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Dario Faggioli
2012-12-20 8:41 ` Ian Campbell
2012-12-20 9:24 ` Dario Faggioli
2012-12-21 16:00 ` George Dunlap
2012-12-21 16:23 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 08 of 10 v2] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-12-21 16:22 ` George Dunlap
2012-12-19 19:07 ` [PATCH 09 of 10 v2] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-12-21 16:34 ` George Dunlap
2012-12-21 16:54 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 10 of 10 v2] docs: rearrange and update NUMA placement documentation Dario Faggioli
2012-12-19 23:16 ` [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2013-01-11 12:19 ` Ian Campbell
2013-01-11 13:57 ` Dario Faggioli
2013-01-11 14:09 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50D47235.4090106@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Marcus.Granado@eu.citrix.com \
--cc=anil@recoil.org \
--cc=dan.magenheimer@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=dgdegra@tycho.nsa.gov \
--cc=juergen.gross@ts.fujitsu.com \
--cc=msw@amazon.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).