From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: xen-devel <xen-devel@lists.xensource.com>,
"Keir (Xen.org)" <keir@xen.org>
Subject: Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
Date: Wed, 5 Dec 2012 12:16:52 +0000 [thread overview]
Message-ID: <50BF3B34.5030501@eu.citrix.com> (raw)
In-Reply-To: <dde3de6d81a3014f1d13.1354552498@Solace>
[-- Attachment #1: Type: text/plain, Size: 3868 bytes --]
On 03/12/12 16:34, Dario Faggioli wrote:
> Right now, when a VCPU wakes-up, we check if the it should preempt
> what is running on the PCPU, and whether or not the waking VCPU can
> be migrated (by tickling some idlers). However, this can result in
> suboptimal or even wrong behaviour, as explained here:
>
> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html
>
> This change, instead, when deciding what PCPUs to tickle upon VCPU
> wake-up, considers both what it is likely to happen on the PCPU
> where the wakeup occurs, as well as whether or not there are idle
> PCPUs where to run the waking VCPU.
> In fact, if there are idlers where the new VCPU can run, we can
> avoid interrupting the running VCPU. OTOH, in case there aren't
> any of these PCPUs, preemption and migration are the way to go.
>
> This has been tested by running the following benchmarks inside 2,
> 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
> 960 MB of memory (host has 16 ways and 12 GB RAM).
>
> 1) All VMs had 'cpus="all"' in their config file.
>
> $ sysbench --test=cpu ... (time, lower is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
> | 6 | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
> | 10 | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
> $ sysbench --test=memory ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
> | 6 | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
> | 10 | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
> $ specjbb2005 ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 43150.63 +/- 1359.5616 | 42720.632 +/- 1937.4488 |
> | 6 | 29274.29 +/- 1024.4042 | 29518.171 +/- 1014.5239 |
> | 10 | 19061.28 +/- 512.88561 | 19050.141 +/- 458.77327 |
>
>
> 2) All VMs had their VCPUs statically pinned to the host's PCPUs.
>
> $ sysbench --test=cpu ... (time, lower is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 47.8211 +/- 0.0215504 | 47.826900 +/- 0.0077872 |
> | 6 | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
> | 10 | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
> $ sysbench --test=memory ... (throughput, higher is better)
> | VMs | w/o this change | w/ this change |
> | 2 | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
> | 6 | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
> | 10 | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
> $ specjbb2005 ... (throughput, higher is better)
> | 2 | 49591.057 +/- 952.93384 | 49610.98 +/- 1242.1675 |
> | 6 | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
> | 10 | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |
>
>
> Numbers show how the change has either no or very limited impact
> (specjbb2005 case) or, when it does have some impact, that is an
> actual improvement in performances, especially in the
> sysbench-memory case.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
So I think the principle is good, but the resulting set of "if"
statements is hard to figure out what's going on.
What do you think about re-arranging things, something like the attached?
This particular version I got rid of the stats, because they require
if() statements that break up the flow. If we really think they're
useful, maybe we could have a separate block somewhere for them?
We could actually do without the idlers_empty entirely, as if we just
remove the condition from the "else" block, the "right thing" will
happen; however, it means several unnecessary cpumask operations on a
busy system.
Thoughts?
-George
[-- Attachment #2: xen_sched_credit_improve_tickling_of_idle_cpus --]
[-- Type: text/plain, Size: 3773 bytes --]
xen: sched_credit, improve tickling of idle CPUs
RFC: Re-organized ifs
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,54 +249,53 @@ static inline void
struct csched_vcpu * const cur =
CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
- cpumask_t mask;
+ cpumask_t mask, idle_mask;
+ int idlers_empty;
ASSERT(cur);
cpumask_clear(&mask);
- /* If strictly higher priority than current VCPU, signal the CPU */
- if ( new->pri > cur->pri )
+ idlers_empty = cpumask_empty(prv->idlers);
+ /*
+ * If the pcpu is idle, or there are no idlers and the new
+ * vcpu is a higher priority than the old vcpu, run it here.
+ *
+ * If there are idle cpus, first try to find one suitable to run
+ * "new", so we can avoid preempting cur. If we cannot find a
+ * suitable idler on which to run "new", run it here, but try to
+ * find a suitable idler on which to run "cur" instead.
+ */
+ if ( cur->pri == CSCHED_PRI_IDLE
+ || (idlers_empty && new->pri > cur->pri) )
{
- if ( cur->pri == CSCHED_PRI_IDLE )
- SCHED_STAT_CRANK(tickle_local_idler);
- else if ( cur->pri == CSCHED_PRI_TS_OVER )
- SCHED_STAT_CRANK(tickle_local_over);
- else if ( cur->pri == CSCHED_PRI_TS_UNDER )
- SCHED_STAT_CRANK(tickle_local_under);
- else
- SCHED_STAT_CRANK(tickle_local_other);
-
cpumask_set_cpu(cpu, &mask);
}
+ else if (!idlers_empty)
+ {
+ /* Check whether or not there are idlers that can run new */
+ cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
- /*
- * If this CPU has at least two runnable VCPUs, we tickle any idlers to
- * let them know there is runnable work in the system...
- */
- if ( cur->pri > CSCHED_PRI_IDLE )
- {
- if ( cpumask_empty(prv->idlers) )
+ /* If there are no suitable idlers for new, and it's higher
+ * priority than cur, wake up the current cpu, but also
+ * look for idlers suitable for cur. */
+ if (cpumask_empty(&idle_mask) && new->pri > cur->pri)
{
- SCHED_STAT_CRANK(tickle_idlers_none);
+ cpumask_set_cpu(cpu, &mask);
+ cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
}
- else
+
+ /* Which of the idlers shall we wake up? */
+ if ( !cpumask_empty(&idle_mask) )
{
- cpumask_t idle_mask;
-
- cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
- if ( !cpumask_empty(&idle_mask) )
+ SCHED_STAT_CRANK(tickle_idlers_some);
+ if ( opt_tickle_one_idle )
{
- SCHED_STAT_CRANK(tickle_idlers_some);
- if ( opt_tickle_one_idle )
- {
- this_cpu(last_tickle_cpu) =
- cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
- cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
- }
- else
- cpumask_or(&mask, &mask, &idle_mask);
+ this_cpu(last_tickle_cpu) =
+ cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+ cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
}
- cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
+ else
+ cpumask_or(&mask, &mask, &idle_mask);
}
}
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2012-12-05 12:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
2012-12-03 17:12 ` Ian Campbell
2012-12-03 18:26 ` Dario Faggioli
2012-12-05 12:16 ` George Dunlap [this message]
2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2012-12-04 18:53 ` George Dunlap
2012-12-04 18:55 ` George Dunlap
2012-12-05 11:57 ` Dario Faggioli
2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-04 19:10 ` George Dunlap
2012-12-05 11:54 ` Dario Faggioli
2012-12-05 11:51 ` George Dunlap
2012-12-05 12:01 ` Ian Campbell
2012-12-05 12:15 ` Dario Faggioli
2012-12-05 12:20 ` Ian Campbell
2012-12-05 12:25 ` George Dunlap
2012-12-05 12:38 ` Mats Petersson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BF3B34.5030501@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=raistlin@linux.it \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.