From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Anil Madhavapeddy <anil@recoil.org>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>,
Daniel De Graaf <dgdegra@tycho.nsa.gov>,
Matt Wilson <msw@amazon.com>
Subject: Re: [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity
Date: Fri, 21 Dec 2012 14:29:09 +0000 [thread overview]
Message-ID: <50D47235.4090106@eu.citrix.com> (raw)
In-Reply-To: <CAAWQectVEihrayJj5n4SPGqA0QJSiC7s2x_oDW=KHyxukWpMSA@mail.gmail.com>
On 20/12/12 18:18, Dario Faggioli wrote:
> On Thu, Dec 20, 2012 at 5:48 PM, George Dunlap
> <george.dunlap@eu.citrix.com> wrote:
>> And in any case, looking at the caller of csched_load_balance(), it
>> explicitly says to steal work if the next thing on the runqueue of cpu has a
>> priority of TS_OVER. That was chosen for a reason -- if you want to change
>> that, you should change it there at the top (and make a justification for
>> doing so), not deeply nested in a function like this.
>>
>> Or am I completely missing something?
>>
> No, you're right. Trying to solve a nasty issue I was seeing, I overlooked I was
> changing the underlying logic until that point... Thanks!
>
> What I want to avoid is the following: a vcpu wakes-up on the busy pcpu Y. As
> a consequence, the idle pcpu X is tickled. Then, for any unrelated reason, pcpu
> Z reschedules and, as it would go idle too, it looks around for any
> vcpu to steal,
> finds one in Y's runqueue and grabs it. Afterward, when X gets the IPI and
> schedules, it just does not find anyone to run and goes back idling.
>
> Now, suppose the vcpu has X, but *not* Z, in its node-affinity (while
> it has a full
> vcpu-affinity, i.e., can run everywhere). In this case, a vcpu that
> could have run on
> a pcpu in its node-affinity, executes outside from it. That happens because,
> the NODE_BALANCE_STEP in csched_load_balance(), when called by Z, won't
> find anything suitable to steal (provided there actually isn't any
> vcpu waiting in
> any runqueue with node-affinity with Z), while the CPU_BALANCE_STEP will
> find our vcpu. :-(
>
> So, what I wanted is something that could tell me whether the pcpu which is
> stealing work is the one that has actually been tickled to do so. I
> was then using
> the pcpu idleness as a (cheap and easy to check) indication of that,
> but I now see
> this is having side effects I in the first place did not want to cause.
>
> Sorry for that, I probably spent so much time buried, as you where
> saying, in the
> various nested loops and calls, that I lost the context a little bit! :-P
OK, that makes sense -- I figured it was something like that. Don't
feel too bad about missing that connection -- we're all fairly blind to
our own code, and I only caught it because I was trying to figure out
what was going on. That's why we do patch review. :-)
Honestly, the whole "steal work" idea seemed a bit backwards to begin
with, but now that we're not just dealing with "possible" and "not
possible", but with "better" and "worse", the work-stealing method of
load balancing sort of falls down.
It does make sense to do the load-balancing work on idle cpus rather
than already-busy cpus; but I wonder if what should happen instead is
that before idling, a pcpu chooses a "busy" pcpu and does a global load
balancing for it -- i.e., pcpu 1 will look at pcpu 5's runqueue, and
consider moving away the vcpus on the runqueue not just to itself but to
any available cpu.
That way, in your example, Z might wake up, look at X's runqueue, and
say, "This would probably run well on Y -- I'll migrate it there."
But that's kind of a half-baked idea at this point.
> Ok, I think the problem I was describing is real, and I've seen it happening and
> causing performances degradation. However, as I think a good solution
> is going to
> be more complex than I thought, I'd better repost without this
> function and deal with
> it in a future separate patch (after having figured out the best way
> of doing so). Is
> that fine with you?
Yes, that's fine. Thanks, Dario.
-George
next prev parent reply other threads:[~2012-12-21 14:29 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-19 19:07 [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-12-20 9:17 ` Jan Beulich
2012-12-20 9:35 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 02 of 10 v2] xen, libxc: introduce node maps and masks Dario Faggioli
2012-12-20 9:18 ` Jan Beulich
2012-12-20 9:55 ` Dario Faggioli
2012-12-20 14:33 ` George Dunlap
2012-12-20 14:52 ` Jan Beulich
2012-12-20 15:13 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity Dario Faggioli
2012-12-20 6:44 ` Juergen Gross
2012-12-20 8:16 ` Dario Faggioli
2012-12-20 8:25 ` Juergen Gross
2012-12-20 8:33 ` Dario Faggioli
2012-12-20 8:39 ` Juergen Gross
2012-12-20 8:58 ` Dario Faggioli
2012-12-20 15:28 ` George Dunlap
2012-12-20 16:00 ` Dario Faggioli
2012-12-20 9:22 ` Jan Beulich
2012-12-20 15:56 ` George Dunlap
2012-12-20 17:12 ` Dario Faggioli
2012-12-20 16:48 ` George Dunlap
2012-12-20 18:18 ` Dario Faggioli
2012-12-21 14:29 ` George Dunlap [this message]
2012-12-21 16:07 ` Dario Faggioli
2012-12-20 20:21 ` George Dunlap
2012-12-21 0:18 ` Dario Faggioli
2012-12-21 14:56 ` George Dunlap
2012-12-21 16:13 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 04 of 10 v2] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-12-21 15:17 ` George Dunlap
2012-12-21 16:17 ` Dario Faggioli
2013-01-03 16:05 ` Daniel De Graaf
2012-12-19 19:07 ` [PATCH 05 of 10 v2] libxc: " Dario Faggioli
2012-12-21 15:19 ` George Dunlap
2012-12-21 16:27 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 06 of 10 v2] libxl: " Dario Faggioli
2012-12-21 15:30 ` George Dunlap
2012-12-21 16:18 ` Dario Faggioli
2012-12-21 17:02 ` Ian Jackson
2012-12-21 17:09 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Dario Faggioli
2012-12-20 8:41 ` Ian Campbell
2012-12-20 9:24 ` Dario Faggioli
2012-12-21 16:00 ` George Dunlap
2012-12-21 16:23 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 08 of 10 v2] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-12-21 16:22 ` George Dunlap
2012-12-19 19:07 ` [PATCH 09 of 10 v2] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-12-21 16:34 ` George Dunlap
2012-12-21 16:54 ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 10 of 10 v2] docs: rearrange and update NUMA placement documentation Dario Faggioli
2012-12-19 23:16 ` [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2013-01-11 12:19 ` Ian Campbell
2013-01-11 13:57 ` Dario Faggioli
2013-01-11 14:09 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50D47235.4090106@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Marcus.Granado@eu.citrix.com \
--cc=anil@recoil.org \
--cc=dan.magenheimer@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=dgdegra@tycho.nsa.gov \
--cc=juergen.gross@ts.fujitsu.com \
--cc=msw@amazon.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.