From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Keir Fraser <keir.xen@gmail.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
xen-devel <xen-devel@lists.xen.org>
Subject: Re: About vcpu wakeup and runq tickling in credit
Date: Fri, 16 Nov 2012 15:44:03 +0000 [thread overview]
Message-ID: <50A65F43.1030309@eu.citrix.com> (raw)
In-Reply-To: <1353067252.5351.124.camel@Solace>
On 16/11/12 12:00, Dario Faggioli wrote:
> On Fri, 2012-11-16 at 11:53 +0100, Dario Faggioli wrote:
>> On Thu, 2012-11-15 at 12:18 +0000, George Dunlap wrote:
>>> Maybe what we should do is do the wake-up based on who is likely to run
>>> on the current cpu: i.e., if "current" is likely to be pre-empted, look
>>> at idlers based on "current"'s mask; if "new" is likely to be put on the
>>> queue, look at idlers based on "new"'s mask.
>>>
>> Ok, find attached the two (trivial) patches that I produced and am
>> testing in these days. Unfortunately, early results shows that I/we
>> might be missing something.
>>
> I'm just came to thinking that this approach, although more, say,
> correct, could have a bad impact on caches and locality in general.
One thing that xenalyze will already tell you is statistics on how a
vcpu migrates over pcpus. For example:
cpu affinity: 242 7009916158 {621089444|5643356292|19752063006}
[0]: 15 6940230676 {400952|5643531152|27013831272}
[1]: 19 6366861827 {117462|5031404806|19751998114}
[2]: 31 6888557514 {1410800684|5643015454|19752100009}
[3]: 18 7790887470 {109764|5920027975|25395539566}
...
The general format is: "$number $average_cycles {5th percentile|50th
percentile|95th percentile}". The first line includes samples from
*all* cpus (i.e,. so it migrated a total of 242 times, averaging 7
billion cycles each time); the subsequent numbers show statistics on
specific pcpus (i.e., it had 15 sessions on pcpu 0, averaging 6.94
billion cycles, &c).
You should be able to use this to do a basic verification of your
hypothesis that vcpus are migrating more often.
> In fact, suppose a new vcpu N wakes up on pcpu #x where another vcpu C
> is running, with prio(N)>prio(C).
>
> What upstream does is asking to #x and to all the idlers that can
> execute N to reschedule. Doing both is, I think, wrong, as there's the
> chance of ending up with N being scheduled on #x and C being runnable
> but not running (in #x's runqueue) even if there are idle cpus that
> could run it, as they're not poked (as already and repeatedly said).
>
> What the patches do, in this case (remember (prio(N)>prio(C)), is asking
> #x and all the idlers that can run C to reschedule, the effect being
> that N will likely run on #x, after a context switch, and C will run
> somewhere else, after a migration, potentially wasting its cache-hotness
> (it is running after all!).
>
> It looks like we can do better... Something like the below:
> + if there are no idlers where N can run, ask #x and the idlers where
> C can run to reschedule (exactly what the patches do, although, they
> do that _unconditionally_), as there isn't anything else we can do
> to try to make sure they both will run;
> + if *there*are* idlers where N can run, _do_not_ ask #x to reschedule
> and only poke them to come pick N up. In fact, in this case, it is
> not necessary to send C away for having both the vcpus ruunning, and
> it seems better to have N experience the migration as, since it's
> waking-up, it's more likely for him than for C to be cache-cold.
I think that makes a lot of sense -- look forward to seeing the results. :-)
There may be some other tricks we could look at. For example, if N and
C are both going to do a significant chunk of computation, then this
strategy will work best. But suppose that C does a significant junk of
computation, but N is only going to run for a few hundred microseconds
and then go to sleep again? In that case, it may be easier to just run
N on the current processor and not bother with IPIs and such; C will run
again in a few microseconds. Conversely, if N will do a significant
chunk of work but C is fairly short, we might as well let C continue
running, as N will shortly get to run.
How to know if the next time this vcpu runs will be long or short? We
could try tracking the runtimes of the last N (maybe 3 or 5) this was
scheduled, and using that to predict the results.
Do you have traces for any of those runs you did? I might just take a
look at them and see if I can make an analysis of cache "temperature"
wrt scheduling. :-)
-George
-George
prev parent reply other threads:[~2012-11-16 15:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 13:34 About vcpu wakeup and runq tickling in credit Dario Faggioli
2012-10-23 15:16 ` George Dunlap
2012-10-24 16:48 ` Dario Faggioli
2012-11-15 12:10 ` Dario Faggioli
2012-11-15 12:18 ` George Dunlap
2012-11-15 15:50 ` Dario Faggioli
2012-11-16 10:53 ` Dario Faggioli
2012-11-16 12:00 ` Dario Faggioli
2012-11-16 15:44 ` George Dunlap [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A65F43.1030309@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=david.vrabel@citrix.com \
--cc=keir.xen@gmail.com \
--cc=raistlin@linux.it \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.