xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <raistlin@linux.it>
To: xen-devel <xen-devel@lists.xensource.com>
Cc: Keir Fraser <keir@xen.org>, George Dunlap <george.dunlap@citrix.com>
Subject: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
Date: Mon, 03 Dec 2012 17:34:58 +0100	[thread overview]
Message-ID: <dde3de6d81a3014f1d13.1354552498@Solace> (raw)
In-Reply-To: <patchbomb.1354552497@Solace>

Right now, when a VCPU wakes-up, we check if the it should preempt
what is running on the PCPU, and whether or not the waking VCPU can
be migrated (by tickling some idlers). However, this can result in
suboptimal or even wrong behaviour, as explained here:

 http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html

This change, instead, when deciding what PCPUs to tickle upon VCPU
wake-up, considers both what it is likely to happen on the PCPU
where the wakeup occurs, as well as whether or not there are idle
PCPUs where to run the waking VCPU.
In fact, if there are idlers where the new VCPU can run, we can
avoid interrupting the running VCPU. OTOH, in case there aren't
any of these PCPUs, preemption and migration are the way to go.

This has been tested by running the following benchmarks inside 2,
6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
960 MB of memory (host has 16 ways and 12 GB RAM).

1) All VMs had 'cpus="all"' in their config file.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
 | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
 | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
 | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
 | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
$ specjbb2005 ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
 | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
 | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |


2) All VMs had their VCPUs statically pinned to the host's PCPUs.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
 | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
 | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
 | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
 | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
$ specjbb2005 ... (throughput, higher is better)
 | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
 | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
 | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |


Numbers show how the change has either no or very limited impact
(specjbb2005 case) or, when it does have some impact, that is an
actual improvement in performances, especially in the
sysbench-memory case.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,13 +249,25 @@ static inline void
     struct csched_vcpu * const cur =
         CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    /* Check whether or not there are idlers that can run new */
+    cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+
+    /*
+     * Should we ask cpu to reschedule? Well, if new can preempt cur,
+     * and there isn't any other place where it can run, we do. OTOH,
+     * if there are idlers where new can run, we can avoid interrupting
+     * cur and ask them to come and pick new up. So no, in that case, we
+     * do not signal cpu, avoiding an unnecessary migration of a running
+     * VCPU. It is true that we are (probably) migrating new, but as it
+     * is waking up, it likely is cache-cold anyway.
+     */
+    if ( new->pri > cur->pri &&
+         (cur->pri == CSCHED_PRI_IDLE || cpumask_empty(&idle_mask)) )
     {
         if ( cur->pri == CSCHED_PRI_IDLE )
             SCHED_STAT_CRANK(tickle_local_idler);
@@ -270,7 +282,7 @@ static inline void
     }
 
     /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
+     * If this CPU has at least two runnable VCPUs, we tickle some idlers to
      * let them know there is runnable work in the system...
      */
     if ( cur->pri > CSCHED_PRI_IDLE )
@@ -281,9 +293,16 @@ static inline void
         }
         else
         {
-            cpumask_t idle_mask;
 
-            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+            /*
+             * If there aren't idlers for new, then letting new preempt cur and
+             * try to migrate cur becomes ineviable. If that is the case, update
+             * the mask of the to-be-tickled CPUs accordingly (i.e., with cur's
+             * idlers instead of new's).
+             */
+            if ( new->pri > cur->pri && cpumask_empty(&idle_mask) )
+                cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
+
             if ( !cpumask_empty(&idle_mask) )
             {
                 SCHED_STAT_CRANK(tickle_idlers_some);
@@ -296,7 +315,6 @@ static inline void
                 else
                     cpumask_or(&mask, &mask, &idle_mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
         }
     }

  reply	other threads:[~2012-12-03 16:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
2012-12-03 16:34 ` Dario Faggioli [this message]
2012-12-03 17:12   ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Ian Campbell
2012-12-03 18:26     ` Dario Faggioli
2012-12-05 12:16   ` George Dunlap
2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2012-12-04 18:53   ` George Dunlap
2012-12-04 18:55     ` George Dunlap
2012-12-05 11:57     ` Dario Faggioli
2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-04 19:10   ` George Dunlap
2012-12-05 11:54     ` Dario Faggioli
2012-12-05 11:51       ` George Dunlap
2012-12-05 12:01       ` Ian Campbell
2012-12-05 12:15         ` Dario Faggioli
2012-12-05 12:20           ` Ian Campbell
2012-12-05 12:25             ` George Dunlap
2012-12-05 12:38           ` Mats Petersson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dde3de6d81a3014f1d13.1354552498@Solace \
    --to=raistlin@linux.it \
    --cc=george.dunlap@citrix.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).