From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: xen-devel <xen-devel@lists.xensource.com>,
"Keir (Xen.org)" <keir@xen.org>
Subject: Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
Date: Wed, 5 Dec 2012 12:16:52 +0000 [thread overview]
Message-ID: <50BF3B34.5030501@eu.citrix.com> (raw)
In-Reply-To: <dde3de6d81a3014f1d13.1354552498@Solace>
[-- Attachment #1: Type: text/plain, Size: 3868 bytes --]
On 03/12/12 16:34, Dario Faggioli wrote:
> Right now, when a VCPU wakes-up, we check if the it should preempt
> what is running on the PCPU, and whether or not the waking VCPU can
> be migrated (by tickling some idlers). However, this can result in
> suboptimal or even wrong behaviour, as explained here:
>
> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html
>
> This change, instead, when deciding what PCPUs to tickle upon VCPU
> wake-up, considers both what it is likely to happen on the PCPU
> where the wakeup occurs, as well as whether or not there are idle
> PCPUs where to run the waking VCPU.
> In fact, if there are idlers where the new VCPU can run, we can
> avoid interrupting the running VCPU. OTOH, in case there aren't
> any of these PCPUs, preemption and migration are the way to go.
>
> This has been tested by running the following benchmarks inside 2,
> 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
> 960 MB of memory (host has 16 ways and 12 GB RAM).
>
> 1) All VMs had 'cpus="all"' in their config file.
>
> $ sysbench --test=cpu ... (time, lower is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
> | 6 | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
> | 10 | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
> $ sysbench --test=memory ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
> | 6 | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
> | 10 | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
> $ specjbb2005 ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 43150.63 +/- 1359.5616 | 42720.632 +/- 1937.4488 |
> | 6 | 29274.29 +/- 1024.4042 | 29518.171 +/- 1014.5239 |
> | 10 | 19061.28 +/- 512.88561 | 19050.141 +/- 458.77327 |
>
>
> 2) All VMs had their VCPUs statically pinned to the host's PCPUs.
>
> $ sysbench --test=cpu ... (time, lower is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 47.8211 +/- 0.0215504 | 47.826900 +/- 0.0077872 |
> | 6 | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
> | 10 | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
> $ sysbench --test=memory ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
> | 6 | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
> | 10 | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
> $ specjbb2005 ... (throughput, higher is better)
> | 2 | 49591.057 +/- 952.93384 | 49610.98 +/- 1242.1675 |
> | 6 | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
> | 10 | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |
>
>
> Numbers show how the change has either no or very limited impact
> (specjbb2005 case) or, when it does have some impact, that is an
> actual improvement in performances, especially in the
> sysbench-memory case.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
So I think the principle is good, but the resulting set of "if"
statements is hard to figure out what's going on.
What do you think about re-arranging things, something like the attached?
This particular version I got rid of the stats, because they require
if() statements that break up the flow. If we really think they're
useful, maybe we could have a separate block somewhere for them?
We could actually do without the idlers_empty entirely, as if we just
remove the condition from the "else" block, the "right thing" will
happen; however, it means several unnecessary cpumask operations on a
busy system.
Thoughts?
-George
[-- Attachment #2: xen_sched_credit_improve_tickling_of_idle_cpus --]
[-- Type: text/plain, Size: 3773 bytes --]
xen: sched_credit, improve tickling of idle CPUs
RFC: Re-organized ifs
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,54 +249,53 @@ static inline void
struct csched_vcpu * const cur =
CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
- cpumask_t mask;
+ cpumask_t mask, idle_mask;
+ int idlers_empty;
ASSERT(cur);
cpumask_clear(&mask);
- /* If strictly higher priority than current VCPU, signal the CPU */
- if ( new->pri > cur->pri )
+ idlers_empty = cpumask_empty(prv->idlers);
+ /*
+ * If the pcpu is idle, or there are no idlers and the new
+ * vcpu is a higher priority than the old vcpu, run it here.
+ *
+ * If there are idle cpus, first try to find one suitable to run
+ * "new", so we can avoid preempting cur. If we cannot find a
+ * suitable idler on which to run "new", run it here, but try to
+ * find a suitable idler on which to run "cur" instead.
+ */
+ if ( cur->pri == CSCHED_PRI_IDLE
+ || (idlers_empty && new->pri > cur->pri) )
{
- if ( cur->pri == CSCHED_PRI_IDLE )
- SCHED_STAT_CRANK(tickle_local_idler);
- else if ( cur->pri == CSCHED_PRI_TS_OVER )
- SCHED_STAT_CRANK(tickle_local_over);
- else if ( cur->pri == CSCHED_PRI_TS_UNDER )
- SCHED_STAT_CRANK(tickle_local_under);
- else
- SCHED_STAT_CRANK(tickle_local_other);
-
cpumask_set_cpu(cpu, &mask);
}
+ else if (!idlers_empty)
+ {
+ /* Check whether or not there are idlers that can run new */
+ cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
- /*
- * If this CPU has at least two runnable VCPUs, we tickle any idlers to
- * let them know there is runnable work in the system...
- */
- if ( cur->pri > CSCHED_PRI_IDLE )
- {
- if ( cpumask_empty(prv->idlers) )
+ /* If there are no suitable idlers for new, and it's higher
+ * priority than cur, wake up the current cpu, but also
+ * look for idlers suitable for cur. */
+ if (cpumask_empty(&idle_mask) && new->pri > cur->pri)
{
- SCHED_STAT_CRANK(tickle_idlers_none);
+ cpumask_set_cpu(cpu, &mask);
+ cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
}
- else
+
+ /* Which of the idlers shall we wake up? */
+ if ( !cpumask_empty(&idle_mask) )
{
- cpumask_t idle_mask;
-
- cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
- if ( !cpumask_empty(&idle_mask) )
+ SCHED_STAT_CRANK(tickle_idlers_some);
+ if ( opt_tickle_one_idle )
{
- SCHED_STAT_CRANK(tickle_idlers_some);
- if ( opt_tickle_one_idle )
- {
- this_cpu(last_tickle_cpu) =
- cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
- cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
- }
- else
- cpumask_or(&mask, &mask, &idle_mask);
+ this_cpu(last_tickle_cpu) =
+ cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+ cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
}
- cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
+ else
+ cpumask_or(&mask, &mask, &idle_mask);
}
}
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-12-05 12:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
2012-12-03 17:12 ` Ian Campbell
2012-12-03 18:26 ` Dario Faggioli
2012-12-05 12:16 ` George Dunlap [this message]
2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2012-12-04 18:53 ` George Dunlap
2012-12-04 18:55 ` George Dunlap
2012-12-05 11:57 ` Dario Faggioli
2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-04 19:10 ` George Dunlap
2012-12-05 11:54 ` Dario Faggioli
2012-12-05 11:51 ` George Dunlap
2012-12-05 12:01 ` Ian Campbell
2012-12-05 12:15 ` Dario Faggioli
2012-12-05 12:20 ` Ian Campbell
2012-12-05 12:25 ` George Dunlap
2012-12-05 12:38 ` Mats Petersson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BF3B34.5030501@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=raistlin@linux.it \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).