xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: xen-devel <xen-devel@lists.xensource.com>,
	"Keir (Xen.org)" <keir@xen.org>
Subject: Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
Date: Wed, 5 Dec 2012 12:16:52 +0000	[thread overview]
Message-ID: <50BF3B34.5030501@eu.citrix.com> (raw)
In-Reply-To: <dde3de6d81a3014f1d13.1354552498@Solace>

[-- Attachment #1: Type: text/plain, Size: 3868 bytes --]

On 03/12/12 16:34, Dario Faggioli wrote:
> Right now, when a VCPU wakes-up, we check if the it should preempt
> what is running on the PCPU, and whether or not the waking VCPU can
> be migrated (by tickling some idlers). However, this can result in
> suboptimal or even wrong behaviour, as explained here:
>
>   http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html
>
> This change, instead, when deciding what PCPUs to tickle upon VCPU
> wake-up, considers both what it is likely to happen on the PCPU
> where the wakeup occurs, as well as whether or not there are idle
> PCPUs where to run the waking VCPU.
> In fact, if there are idlers where the new VCPU can run, we can
> avoid interrupting the running VCPU. OTOH, in case there aren't
> any of these PCPUs, preemption and migration are the way to go.
>
> This has been tested by running the following benchmarks inside 2,
> 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
> 960 MB of memory (host has 16 ways and 12 GB RAM).
>
> 1) All VMs had 'cpus="all"' in their config file.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
>   | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
>   | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
>   | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
>   | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
> $ specjbb2005 ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
>   | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
>   | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |
>
>
> 2) All VMs had their VCPUs statically pinned to the host's PCPUs.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
>   | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
>   | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
>   | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
>   | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
> $ specjbb2005 ... (throughput, higher is better)
>   | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
>   | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
>   | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |
>
>
> Numbers show how the change has either no or very limited impact
> (specjbb2005 case) or, when it does have some impact, that is an
> actual improvement in performances, especially in the
> sysbench-memory case.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

So I think the principle is good, but the resulting set of "if" 
statements is hard to figure out what's going on.

What do you think about re-arranging things, something like the attached?

This particular version I got rid of the stats, because they require 
if() statements that break up the flow.  If we really think they're 
useful, maybe we could have a separate block somewhere for them?

We could actually do without the idlers_empty entirely, as if we just 
remove the condition from the "else" block, the "right thing" will 
happen; however, it means several unnecessary cpumask operations on a 
busy system.

Thoughts?

  -George


[-- Attachment #2: xen_sched_credit_improve_tickling_of_idle_cpus --]
[-- Type: text/plain, Size: 3773 bytes --]

xen: sched_credit, improve tickling of idle CPUs

RFC: Re-organized ifs

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,54 +249,53 @@ static inline void
     struct csched_vcpu * const cur =
         CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
+    int idlers_empty;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    idlers_empty = cpumask_empty(prv->idlers);
+    /*
+     * If the pcpu is idle, or there are no idlers and the new
+     * vcpu is a higher priority than the old vcpu, run it here.
+     *
+     * If there are idle cpus, first try to find one suitable to run
+     * "new", so we can avoid preempting cur.  If we cannot find a
+     * suitable idler on which to run "new", run it here, but try to
+     * find a suitable idler on which to run "cur" instead.
+     */
+    if ( cur->pri == CSCHED_PRI_IDLE
+         || (idlers_empty && new->pri > cur->pri) )
     {
-        if ( cur->pri == CSCHED_PRI_IDLE )
-            SCHED_STAT_CRANK(tickle_local_idler);
-        else if ( cur->pri == CSCHED_PRI_TS_OVER )
-            SCHED_STAT_CRANK(tickle_local_over);
-        else if ( cur->pri == CSCHED_PRI_TS_UNDER )
-            SCHED_STAT_CRANK(tickle_local_under);
-        else
-            SCHED_STAT_CRANK(tickle_local_other);
-
         cpumask_set_cpu(cpu, &mask);
     }
+    else if (!idlers_empty)
+    {
+        /* Check whether or not there are idlers that can run new */
+        cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
 
-    /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
-     * let them know there is runnable work in the system...
-     */
-    if ( cur->pri > CSCHED_PRI_IDLE )
-    {
-        if ( cpumask_empty(prv->idlers) )
+        /* If there are no suitable idlers for new, and it's higher
+         * priority than cur, wake up the current cpu, but also
+         * look for idlers suitable for cur. */
+        if (cpumask_empty(&idle_mask) && new->pri > cur->pri)
         {
-            SCHED_STAT_CRANK(tickle_idlers_none);
+            cpumask_set_cpu(cpu, &mask);
+            cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
         }
-        else
+
+        /* Which of the idlers shall we wake up? */
+        if ( !cpumask_empty(&idle_mask) )
         {
-            cpumask_t idle_mask;
-
-            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
-            if ( !cpumask_empty(&idle_mask) )
+            SCHED_STAT_CRANK(tickle_idlers_some);
+            if ( opt_tickle_one_idle )
             {
-                SCHED_STAT_CRANK(tickle_idlers_some);
-                if ( opt_tickle_one_idle )
-                {
-                    this_cpu(last_tickle_cpu) = 
-                        cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
-                    cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
-                }
-                else
-                    cpumask_or(&mask, &mask, &idle_mask);
+                this_cpu(last_tickle_cpu) = 
+                    cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+                cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
+            else
+                cpumask_or(&mask, &mask, &idle_mask);
         }
     }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2012-12-05 12:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
2012-12-03 17:12   ` Ian Campbell
2012-12-03 18:26     ` Dario Faggioli
2012-12-05 12:16   ` George Dunlap [this message]
2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2012-12-04 18:53   ` George Dunlap
2012-12-04 18:55     ` George Dunlap
2012-12-05 11:57     ` Dario Faggioli
2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-04 19:10   ` George Dunlap
2012-12-05 11:54     ` Dario Faggioli
2012-12-05 11:51       ` George Dunlap
2012-12-05 12:01       ` Ian Campbell
2012-12-05 12:15         ` Dario Faggioli
2012-12-05 12:20           ` Ian Campbell
2012-12-05 12:25             ` George Dunlap
2012-12-05 12:38           ` Mats Petersson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BF3B34.5030501@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=raistlin@linux.it \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).