From: Dario Faggioli <raistlin@linux.it>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Keir Fraser <keir.xen@gmail.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
xen-devel <xen-devel@lists.xen.org>
Subject: Re: About vcpu wakeup and runq tickling in credit
Date: Fri, 16 Nov 2012 11:53:54 +0100 [thread overview]
Message-ID: <1353063234.5351.107.camel@Solace> (raw)
In-Reply-To: <50A4DD95.5020107@eu.citrix.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 4033 bytes --]
(Cc-ing David as it looks like he uses xenalyze quite a bit, and I'm
seeking for any advice on how to squeeze data from there too :-P)
On Thu, 2012-11-15 at 12:18 +0000, George Dunlap wrote:
> Maybe what we should do is do the wake-up based on who is likely to run
> on the current cpu: i.e., if "current" is likely to be pre-empted, look
> at idlers based on "current"'s mask; if "new" is likely to be put on the
> queue, look at idlers based on "new"'s mask.
>
Ok, find attached the two (trivial) patches that I produced and am
testing in these days. Unfortunately, early results shows that I/we
might be missing something.
In fact, although I still don't yet have the numbers for the NUMA-aware
scheduling case (which is what originated all this! :-D), comparing
'upstream' and 'patched' (namely, 'upstream' plus the two attached
patches) I can spot some perf regressions. :-(
Here's the results of running some benchmarks on 2, 6 and 10 VMs. Each
VM has 2 VCPUs and they run and execute the benchmarks concurrently on a
16 CPUs host. (Each test is repeated 3 times, and avg+/-stddev is what
is reported).
Also, the VCPUs where statically pinned on the host's PCPUs. As already
said, numbers for no-pinning and NUMA-scheduling will follow.
+ sysbench --test=memory (throughput, higher is better)
#VMs | upstream | patched
2 | 550.97667 +/- 2.3512355 | 540.185 +/- 21.416892
6 | 443.15 +/- 5.7471797 | 442.66389 +/- 2.1071732
10 | 313.89233 +/- 1.3237493 | 305.69567 +/- 0.3279853
+ sysbench --test=cpu (time, lower is better)
#VMs | upstream | patched
2 | 47.8211 +/- 0.0215503 | 47.816117 +/- 0.0174079
6 | 62.689122 +/- 0.0877172 | 62.789883 +/- 0.1892171
10 | 90.321097 +/- 1.4803867 | 91.197767 +/- 0.1032667
+ specjbb2005 (throughput, higher is better)
#VMs | upstream | patched
2 | 49591.057 +/- 952.93384 | 50008.28 +/- 1502.4863
6 | 33538.247 +/- 1089.2115 | 33647.873 +/- 1007.3538
10 | 21927.87 +/- 831.88742 | 21869.654 +/- 578.236
So, as you can easily see, the numbers are very similar, with cases
where the patches produces some slight performance reduction, while I
was expecting the opposite, i.e., similar but a little bit better with
the patches.
For most of the runs of all the benchmarks, I have the full traces
(although, only for SCHED-* events, IIRC), so I can investigate more.
It's an huge amount of data, so it's really hard to make sense out of
it, and any advice and direction on that would be much appreciated.
For instance, looking at one of the runs of sysbench-memory, here's what
I found. With 10 VMs, the memory throughput reported by one of the VM
during one of the runs is as follows:
upstream: 315.68 MB/s
patched: 306.69 MB/s
I then went through the traces and I found out that the patched case
lasted longer (for transferring the same amount of memory, hence the
lower throughput), but with the following runstate related results:
upstream: running for 73.67% of the time
runnable for 24.94% of the time
patched: running for 74.57% of the time
runnable for 24.10% of the time
And that is consistent with other random instances I checked. So, it
looks like the patches are, after all, doing their job in increasing (at
least a little) the running time, at the expenses of the runnable time,
of the various VCPUs, but the benefits of that is being all eaten by
some other effect --to the point that sometimes things go even worse--
that I'm not able to identify... For now! :-P
Any idea about what's going on and what I should check to better figure
that out?
Thanks a lot and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.1.2: xen-sched_credit-clarify-cpumask-and-during-tickle.patch --]
[-- Type: text/x-patch, Size: 936 bytes --]
# HG changeset patch
# Parent b0c342b749765bf254c664883d4f5e2891c1ff18
diff -r b0c342b74976 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Fri Nov 09 11:02:54 2012 +0100
+++ b/xen/common/sched_credit.c Thu Nov 15 18:22:56 2012 +0100
@@ -254,7 +254,11 @@ static inline void
ASSERT(cur);
cpumask_clear(&mask);
- /* If strictly higher priority than current VCPU, signal the CPU */
+ /*
+ * If new is strictly higher priority than current VCPU, let CPU
+ * know that re-scheduling is needed. That will likely pick-up new
+ * and put cur back in the runqueue.
+ */
if ( new->pri > cur->pri )
{
if ( cur->pri == CSCHED_PRI_IDLE )
@@ -296,7 +300,6 @@ static inline void
else
cpumask_or(&mask, &mask, &idle_mask);
}
- cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
}
}
[-- Attachment #1.1.3: xen-sched_credit-fix-tickling --]
[-- Type: text/plain, Size: 1435 bytes --]
# HG changeset patch
# Parent 3a70bd1d02c1334857c84c9fb5e1dd22b6603a2c
diff -r 3a70bd1d02c1 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Nov 15 18:22:56 2012 +0100
+++ b/xen/common/sched_credit.c Thu Nov 15 19:03:19 2012 +0100
@@ -274,7 +274,7 @@ static inline void
}
/*
- * If this CPU has at least two runnable VCPUs, we tickle any idlers to
+ * If this CPU has at least two runnable VCPUs, we tickle some idlers to
* let them know there is runnable work in the system...
*/
if ( cur->pri > CSCHED_PRI_IDLE )
@@ -287,7 +287,17 @@ static inline void
{
cpumask_t idle_mask;
- cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+ /*
+ * Which idlers do we want to tickle? If new has higher priority,
+ * it will likely preempt cur and run here. We then need someone
+ * where cur can run to come and pick it up. Vice-versa, if it is
+ * cur that stays, we poke idlers where new can run.
+ */
+ if ( new->pri > cur->pri )
+ cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
+ else
+ cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+
if ( !cpumask_empty(&idle_mask) )
{
SCHED_STAT_CRANK(tickle_idlers_some);
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-11-16 10:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 13:34 About vcpu wakeup and runq tickling in credit Dario Faggioli
2012-10-23 15:16 ` George Dunlap
2012-10-24 16:48 ` Dario Faggioli
2012-11-15 12:10 ` Dario Faggioli
2012-11-15 12:18 ` George Dunlap
2012-11-15 15:50 ` Dario Faggioli
2012-11-16 10:53 ` Dario Faggioli [this message]
2012-11-16 12:00 ` Dario Faggioli
2012-11-16 15:44 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1353063234.5351.107.camel@Solace \
--to=raistlin@linux.it \
--cc=JBeulich@suse.com \
--cc=david.vrabel@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=keir.xen@gmail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).