xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing
@ 2012-12-17 22:28 Dario Faggioli
  2012-12-17 22:28 ` [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu) Dario Faggioli
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

Hello,

Here's the take 3 of this series (last round here:
 http://comments.gmane.org/gmane.comp.emulators.xen.devel/145998).

Super quickly, this is about fixing a couple of anomalies in the  credit
scheduler and adding some tracing to it.  All the comments raised during v2's
review have been addressed.

Quick summary of the series (* = Acked):

   1/5 xen: sched_credit: define and use curr_on_cpu(cpu)
   2/5 xen: sched_credit: improve picking up the idle CPU for a VCPU
 * 3/5 xen: sched_credit: improve tickling of idle CPUs
 * 4/5 xen: tracing: introduce per-scheduler trace event IDs
   5/5 xen: sched_credit: add some tracing

Thanks and Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu)
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
@ 2012-12-17 22:28 ` Dario Faggioli
  2012-12-18 12:10   ` George Dunlap
  2012-12-17 22:28 ` [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU Dario Faggioli
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

To fetch `per_cpu(schedule_data,cpu).curr' in a more readable
way. It's in sched-if.h as that is where `struct schedule_data'
is declared.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v2:
* This patch now contains both macro definition and usage, (and
  has been moved to the top of the series), as suggested during
  review.
* The macro has been moved moved to sched-if.h, as requested
  during review.
* The macro has been renamed curr_on_cpu(), to match with the
  `*curr' field in `struct schedule_data' to which it points.

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -228,7 +228,7 @@ static void burn_credits(struct csched_v
     unsigned int credits;
 
     /* Assert svc is current */
-    ASSERT(svc==CSCHED_VCPU(per_cpu(schedule_data, svc->vcpu->processor).curr));
+    ASSERT( svc == CSCHED_VCPU(curr_on_cpu(svc->vcpu->processor)) );
 
     if ( (delta = now - svc->start_time) <= 0 )
         return;
@@ -246,8 +246,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle
 static inline void
 __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 {
-    struct csched_vcpu * const cur =
-        CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
+    struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
     cpumask_t mask;
 
@@ -371,7 +370,7 @@ csched_alloc_pdata(const struct schedule
         per_cpu(schedule_data, cpu).sched_priv = spc;
 
     /* Start off idling... */
-    BUG_ON(!is_idle_vcpu(per_cpu(schedule_data, cpu).curr));
+    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
     cpumask_set_cpu(cpu, prv->idlers);
 
     spin_unlock_irqrestore(&prv->lock, flags);
@@ -709,7 +708,7 @@ csched_vcpu_sleep(const struct scheduler
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( per_cpu(schedule_data, vc->processor).curr == vc )
+    if ( curr_on_cpu(vc->processor) == vc )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( __vcpu_on_runq(svc) )
         __runq_remove(svc);
@@ -723,7 +722,7 @@ csched_vcpu_wake(const struct scheduler 
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    if ( unlikely(per_cpu(schedule_data, cpu).curr == vc) )
+    if ( unlikely(curr_on_cpu(cpu) == vc) )
     {
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
@@ -1192,7 +1191,7 @@ static struct csched_vcpu *
 csched_runq_steal(int peer_cpu, int cpu, int pri)
 {
     const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
-    const struct vcpu * const peer_vcpu = per_cpu(schedule_data, peer_cpu).curr;
+    const struct vcpu * const peer_vcpu = curr_on_cpu(peer_cpu);
     struct csched_vcpu *speer;
     struct list_head *iter;
     struct vcpu *vc;
@@ -1480,7 +1479,7 @@ csched_dump_pcpu(const struct scheduler 
     printk("core=%s\n", cpustr);
 
     /* current VCPU */
-    svc = CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
+    svc = CSCHED_VCPU(curr_on_cpu(cpu));
     if ( svc )
     {
         printk("\trun: ");
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -41,6 +41,8 @@ struct schedule_data {
     atomic_t            urgent_count;   /* how many urgent vcpus           */
 };
 
+#define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
+
 DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
  2012-12-17 22:28 ` [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu) Dario Faggioli
@ 2012-12-17 22:28 ` Dario Faggioli
  2012-12-18 12:12   ` George Dunlap
  2012-12-17 22:29 ` [PATCH 3 of 5 v3] xen: sched_credit: improve tickling of idle CPUs Dario Faggioli
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

In _csched_cpu_pick() we try to select the best possible CPU for
running a VCPU, considering the characteristics of the underlying
hardware (i.e., how many threads, core, sockets, and how busy they
are). What we want is "the idle execution vehicle with the most
idling neighbours in its grouping".

In order to achieve it, we select a CPU from the VCPU's affinity,
giving preference to its current processor if possible, as the basis
for the comparison with all the other CPUs. Problem is, to discount
the VCPU itself when computing this "idleness" (in an attempt to be
fair wrt its current processor), we arbitrarily and unconditionally
consider that selected CPU as idle, even when it is not the case,
for instance:
 1. If the CPU is not the one where the VCPU is running (perhaps due
    to the affinity being changed);
 2. The CPU is where the VCPU is running, but it has other VCPUs in
    its runq, so it won't go idle even if the VCPU in question goes.

This is exemplified in the trace below:

]  3.466115364 x|------|------| d10v1   22005(2:2:5) 3 [ a 1 8 ]
   ... ... ...
   3.466122856 x|------|------| d10v1 runstate_change d10v1 running->offline
   3.466123046 x|------|------| d?v? runstate_change d32767v0 runnable->running
   ... ... ...
]  3.466126887 x|------|------| d32767v0   28004(2:8:4) 3 [ a 1 8 ]

22005(...) line (the first line) means _csched_cpu_pick() was called on
VCPU 1 of domain 10, while it is running on CPU 0, and it choose CPU 8,
which is busy ('|'), even if there are plenty of idle CPUs. That is
because, as a consequence of changing the VCPU affinity, CPU 8 was
chosen as the basis for the comparison, and therefore considered idle
(its bit gets unconditionally set in the bitmask representing the idle
CPUs). 28004(...) line means the VCPU is woken up and queued on CPU 8's
runq, where it waits for a context switch or a migration, in order to
be able to execute.

This change fixes things by only considering the "guessed" CPU idle if
the VCPU in question is both running there and is its only runnable
VCPU.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v2:
 * Use `vc->processor' instead of curr_on_cpu() for determining whether
   or not vc is current on cpu, as suggested during review.
 * Fixed IS_IDLE_RUNQ() macro in case runq is empty.
 * Ditched the variable renaming, as requested during review.

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -59,6 +59,9 @@
 #define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
 #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
 #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
+/* Is the first element of _cpu's runq its idle vcpu? */
+#define IS_RUNQ_IDLE(_cpu)  (list_empty(RUNQ(_cpu)) || \
+                             is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
 
 
 /*
@@ -478,9 +481,14 @@ static int
      * distinct cores first and guarantees we don't do something stupid
      * like run two VCPUs on co-hyperthreads while there are idle cores
      * or sockets.
+     *
+     * Notice that, when computing the "idleness" of cpu, we may want to
+     * discount vc. That is, iff vc is the currently running and the only
+     * runnable vcpu on cpu, we add cpu to the idlers.
      */
     cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
-    cpumask_set_cpu(cpu, &idlers);
+    if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) )
+        cpumask_set_cpu(cpu, &idlers);
     cpumask_and(&cpus, &cpus, &idlers);
     cpumask_clear_cpu(cpu, &cpus);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3 of 5 v3] xen: sched_credit: improve tickling of idle CPUs
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
  2012-12-17 22:28 ` [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu) Dario Faggioli
  2012-12-17 22:28 ` [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU Dario Faggioli
@ 2012-12-17 22:29 ` Dario Faggioli
  2012-12-17 22:29 ` [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

Right now, when a VCPU wakes-up, we check whether it should preempt
what is running on the PCPU, and whether or not the waking VCPU can
be migrated (by tickling some idlers). However, this can result in
suboptimal or even wrong behaviour, as explained here:

 http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html

This change, instead, when deciding which PCPU(s) to tickle, upon
VCPU wake-up, considers both what it is likely to happen on the PCPU
where the wakeup occurs,and whether or not there are idlers where
the woken-up VCPU can run. In fact, if there are, we can avoid
interrupting the running VCPU. Only in case there aren't any of
these PCPUs, preemption and migration are the way to go.

This has been tested (on top of the previous change) by running
the following benchmarks inside 2, 6 and 10 VMs, concurrently, on
a shared host, each with 2 VCPUs and 960 MB of memory (host had 16
ways and 12 GB RAM).

1) All VMs had 'cpus="all"' in their config file.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 50.078467 +/- 1.6676162 | 49.673667 +/- 0.0094321 |
 | 6   | 63.259472 +/- 0.1137586 | 61.680011 +/- 1.0208723 |
 | 10  | 91.246797 +/- 0.1154008 | 90.396720 +/- 1.5900423 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 485.56333 +/- 6.0527356 | 487.83167 +/- 0.7602850 |
 | 6   | 401.36278 +/- 1.9745916 | 409.96778 +/- 3.6761092 |
 | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
$ specjbb2005 ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 43150.63 +/- 1359.5616  | 43275.427 +/- 606.28185 |
 | 6   | 29274.29 +/- 1024.4042  | 29716.189 +/- 1290.1878 |
 | 10  | 19061.28 +/- 512.88561  | 19192.599 +/- 605.66058 |


2) All VMs had their VCPUs statically pinned to the host's PCPUs.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
 | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
 | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
 | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
 | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
$ specjbb2005 ... (throughput, higher is better)
 | 2   | 49591.057 +/- 952.93384 | 49594.195 +/- 799.57976 |
 | 6   | 33538.247 +/- 1089.2115 | 33671.758 +/- 1077.6806 |
 | 10  | 21927.870 +/- 831.88742 | 21891.131 +/- 563.37929 |


Numbers show how the change has either no or very limited impact
(specjbb2005 case) or, when it does have some impact, that is a
real improvement in performances (sysbench-memory case).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v1:
 * Rewritten as per George's suggestion, in order to improve readability.
 * Killed some of the stats, with the only exception of `tickle_idlers_none'
   and `tickle_idlers_some'. They don't make things look that terrible and
   I think they could be useful.
 * The preemption+migration of the currently running VCPU has been turned
   into a migration request, instead than just tickling. I traced this
   thing some more, and it looks like that is the way to go. Tickling is
   not effective here, because the woken PCPU would expect cur to be out
   of the scheduler tail, which is likely false (cur->is_running is
   still set to 1).

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -134,6 +134,7 @@ struct csched_vcpu {
         uint32_t state_idle;
         uint32_t migrate_q;
         uint32_t migrate_r;
+        uint32_t kicked_away;
     } stats;
 #endif
 };
@@ -251,54 +252,67 @@ static inline void
 {
     struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
+    int idlers_empty;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    idlers_empty = cpumask_empty(prv->idlers);
+
+    /*
+     * If the pcpu is idle, or there are no idlers and the new
+     * vcpu is a higher priority than the old vcpu, run it here.
+     *
+     * If there are idle cpus, first try to find one suitable to run
+     * new, so we can avoid preempting cur.  If we cannot find a
+     * suitable idler on which to run new, run it here, but try to
+     * find a suitable idler on which to run cur instead.
+     */
+    if ( cur->pri == CSCHED_PRI_IDLE
+         || (idlers_empty && new->pri > cur->pri) )
     {
-        if ( cur->pri == CSCHED_PRI_IDLE )
-            SCHED_STAT_CRANK(tickle_local_idler);
-        else if ( cur->pri == CSCHED_PRI_TS_OVER )
-            SCHED_STAT_CRANK(tickle_local_over);
-        else if ( cur->pri == CSCHED_PRI_TS_UNDER )
-            SCHED_STAT_CRANK(tickle_local_under);
-        else
-            SCHED_STAT_CRANK(tickle_local_other);
-
+        if ( cur->pri != CSCHED_PRI_IDLE )
+            SCHED_STAT_CRANK(tickle_idlers_none);
         cpumask_set_cpu(cpu, &mask);
     }
+    else if ( !idlers_empty )
+    {
+        /* Check whether or not there are idlers that can run new */
+        cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
 
-    /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
-     * let them know there is runnable work in the system...
-     */
-    if ( cur->pri > CSCHED_PRI_IDLE )
-    {
-        if ( cpumask_empty(prv->idlers) )
+        /*
+         * If there are no suitable idlers for new, and it's higher
+         * priority than cur, ask the scheduler to migrate cur away.
+         * We have to act like this (instead of just waking some of
+         * the idlers suitable for cur) because cur is running.
+         *
+         * If there are suitable idlers for new, no matter priorities,
+         * leave cur alone (as it is running and is, likely, cache-hot)
+         * and wake some of them (which is waking up and so is, likely,
+         * cache cold anyway).
+         */
+        if ( cpumask_empty(&idle_mask) && new->pri > cur->pri )
         {
             SCHED_STAT_CRANK(tickle_idlers_none);
+            SCHED_VCPU_STAT_CRANK(cur, kicked_away);
+            SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+            SCHED_STAT_CRANK(migrate_kicked_away);
+            set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+            cpumask_set_cpu(cpu, &mask);
         }
-        else
+        else if ( !cpumask_empty(&idle_mask) )
         {
-            cpumask_t idle_mask;
-
-            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
-            if ( !cpumask_empty(&idle_mask) )
+            /* Which of the idlers suitable for new shall we wake up? */
+            SCHED_STAT_CRANK(tickle_idlers_some);
+            if ( opt_tickle_one_idle )
             {
-                SCHED_STAT_CRANK(tickle_idlers_some);
-                if ( opt_tickle_one_idle )
-                {
-                    this_cpu(last_tickle_cpu) = 
-                        cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
-                    cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
-                }
-                else
-                    cpumask_or(&mask, &mask, &idle_mask);
+                this_cpu(last_tickle_cpu) =
+                    cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+                cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
+            else
+                cpumask_or(&mask, &mask, &idle_mask);
         }
     }
 
@@ -1456,13 +1470,14 @@ csched_dump_vcpu(struct csched_vcpu *svc
     {
         printk(" credit=%i [w=%u]", atomic_read(&svc->credit), sdom->weight);
 #ifdef CSCHED_STATS
-        printk(" (%d+%u) {a/i=%u/%u m=%u+%u}",
+        printk(" (%d+%u) {a/i=%u/%u m=%u+%u (k=%u)}",
                 svc->stats.credit_last,
                 svc->stats.credit_incr,
                 svc->stats.state_active,
                 svc->stats.state_idle,
                 svc->stats.migrate_q,
-                svc->stats.migrate_r);
+                svc->stats.migrate_r,
+                svc->stats.kicked_away);
 #endif
     }
 
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -39,10 +39,6 @@ PERFCOUNTER(vcpu_wake_runnable,     "csc
 PERFCOUNTER(vcpu_wake_not_runnable, "csched: vcpu_wake_not_runnable")
 PERFCOUNTER(vcpu_park,              "csched: vcpu_park")
 PERFCOUNTER(vcpu_unpark,            "csched: vcpu_unpark")
-PERFCOUNTER(tickle_local_idler,     "csched: tickle_local_idler")
-PERFCOUNTER(tickle_local_over,      "csched: tickle_local_over")
-PERFCOUNTER(tickle_local_under,     "csched: tickle_local_under")
-PERFCOUNTER(tickle_local_other,     "csched: tickle_local_other")
 PERFCOUNTER(tickle_idlers_none,     "csched: tickle_idlers_none")
 PERFCOUNTER(tickle_idlers_some,     "csched: tickle_idlers_some")
 PERFCOUNTER(load_balance_idle,      "csched: load_balance_idle")
@@ -52,6 +48,7 @@ PERFCOUNTER(steal_trylock_failed,   "csc
 PERFCOUNTER(steal_peer_idle,        "csched: steal_peer_idle")
 PERFCOUNTER(migrate_queued,         "csched: migrate_queued")
 PERFCOUNTER(migrate_running,        "csched: migrate_running")
+PERFCOUNTER(migrate_kicked_away,    "csched: migrate_kicked_away")
 PERFCOUNTER(vcpu_hot,               "csched: vcpu_hot")
 
 PERFCOUNTER(need_flush_tlb_flush,   "PG_need_flush tlb flushes")

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
                   ` (2 preceding siblings ...)
  2012-12-17 22:29 ` [PATCH 3 of 5 v3] xen: sched_credit: improve tickling of idle CPUs Dario Faggioli
@ 2012-12-17 22:29 ` Dario Faggioli
  2013-03-08 16:08   ` George Dunlap
  2012-12-17 22:29 ` [PATCH 5 of 5 v3] xen: sched_credit: add some tracing Dario Faggioli
  2012-12-18 12:16 ` [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and " George Dunlap
  5 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

So that it becomes possible to create scheduler specific trace records,
within each scheduler, without worrying about the overlapping, and also
without giving up being able to recognise them univocally. The latter
is deemed as useful, since we can have more than one scheduler running
at the same time, thanks to cpupools.

The event ID is 12 bits, and this change uses the upper 3 of them for
the 'scheduler ID'. This means we're limited to 8 schedulers and to
512 scheduler specific tracing events. Both seem reasonable limitations
as of now.

This also converts the existing credit2 tracing (the only scheduler
generating tracing events up to now) to the new system.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v1:
 * The event ID generaion macro is now called `TRC_SCHED_CLASS_EVT()',
   and has been generalized and put in trace.h, as suggested.
 * The handling of per-scheduler tracing IDs and masks have been
   restructured, properly naming "ID" the numerical identifiers
   and "MASK" the bitmasks, as requested.

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -29,18 +29,22 @@
 #define d2printk(x...)
 //#define d2printk printk
 
-#define TRC_CSCHED2_TICK        TRC_SCHED_CLASS + 1
-#define TRC_CSCHED2_RUNQ_POS    TRC_SCHED_CLASS + 2
-#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3
-#define TRC_CSCHED2_CREDIT_ADD  TRC_SCHED_CLASS + 4
-#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5
-#define TRC_CSCHED2_TICKLE       TRC_SCHED_CLASS + 6
-#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7
-#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8
-#define TRC_CSCHED2_UPDATE_LOAD   TRC_SCHED_CLASS + 9
-#define TRC_CSCHED2_RUNQ_ASSIGN   TRC_SCHED_CLASS + 10
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD   TRC_SCHED_CLASS + 11
-#define TRC_CSCHED2_UPDATE_RUNQ_LOAD   TRC_SCHED_CLASS + 12
+/*
+ * Credit2 tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED2_TICK             TRC_SCHED_CLASS_EVT(CSCHED2, 1)
+#define TRC_CSCHED2_RUNQ_POS         TRC_SCHED_CLASS_EVT(CSCHED2, 2)
+#define TRC_CSCHED2_CREDIT_BURN      TRC_SCHED_CLASS_EVT(CSCHED2, 3)
+#define TRC_CSCHED2_CREDIT_ADD       TRC_SCHED_CLASS_EVT(CSCHED2, 4)
+#define TRC_CSCHED2_TICKLE_CHECK     TRC_SCHED_CLASS_EVT(CSCHED2, 5)
+#define TRC_CSCHED2_TICKLE           TRC_SCHED_CLASS_EVT(CSCHED2, 6)
+#define TRC_CSCHED2_CREDIT_RESET     TRC_SCHED_CLASS_EVT(CSCHED2, 7)
+#define TRC_CSCHED2_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(CSCHED2, 8)
+#define TRC_CSCHED2_UPDATE_LOAD      TRC_SCHED_CLASS_EVT(CSCHED2, 9)
+#define TRC_CSCHED2_RUNQ_ASSIGN      TRC_SCHED_CLASS_EVT(CSCHED2, 10)
+#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11)
+#define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 12)
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be found at the
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -57,6 +57,32 @@
 #define TRC_SCHED_CLASS     0x00022000   /* Scheduler-specific    */
 #define TRC_SCHED_VERBOSE   0x00028000   /* More inclusive scheduling */
 
+/*
+ * The highest 3 bits of the last 12 bits of TRC_SCHED_CLASS above are
+ * reserved for encoding what scheduler produced the information. The
+ * actual event is encoded in the last 9 bits.
+ *
+ * This means we have 8 scheduling IDs available (which means at most 8
+ * schedulers generating events) and, in each scheduler, up to 512
+ * different events.
+ */
+#define TRC_SCHED_ID_BITS 3
+#define TRC_SCHED_ID_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS)
+#define TRC_SCHED_ID_MASK (((1UL<<TRC_SCHED_ID_BITS) - 1) << TRC_SCHED_ID_SHIFT)
+#define TRC_SCHED_EVT_MASK (~(TRC_SCHED_ID_MASK))
+
+/* Per-scheduler IDs, to identify scheduler specific events */
+#define TRC_SCHED_CSCHED   0
+#define TRC_SCHED_CSCHED2  1
+#define TRC_SCHED_SEDF     2
+#define TRC_SCHED_ARINC653 3
+
+/* Per-scheduler tracing */
+#define TRC_SCHED_CLASS_EVT(_c, _e) \
+  ( ( TRC_SCHED_CLASS | \
+      ((TRC_SCHED_##_c << TRC_SCHED_ID_SHIFT) & TRC_SCHED_ID_MASK) ) + \
+    (_e & TRC_SCHED_EVT_MASK) )
+
 /* Trace classes for Hardware */
 #define TRC_HW_PM           0x00801000   /* Power management traces */
 #define TRC_HW_IRQ          0x00802000   /* Traces relating to the handling of IRQs */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5 of 5 v3] xen: sched_credit: add some tracing
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
                   ` (3 preceding siblings ...)
  2012-12-17 22:29 ` [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
@ 2012-12-17 22:29 ` Dario Faggioli
  2012-12-18 12:15   ` George Dunlap
  2012-12-18 12:16 ` [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and " George Dunlap
  5 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2012-12-17 22:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap, Ian Campbell

About tickling, and PCPU selection.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v2:
* Call to `trace_var()' converted to `__trace_var()', as it originally
  was (something got messed up while reworking this for v2.
  Thanks George. :-) )

Changes from v1:
 * Dummy `struct d {}', accommodating `cpu' only, removed
   in spite of the much more readable `trace_var(..., sizeof(cpu), &cpu)',
   as suggested.

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -21,6 +21,7 @@
 #include <asm/atomic.h>
 #include <xen/errno.h>
 #include <xen/keyhandler.h>
+#include <xen/trace.h>
 
 
 /*
@@ -98,6 +99,18 @@
 
 
 /*
+ * Credit tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED, 1)
+#define TRC_CSCHED_ACCOUNT_START TRC_SCHED_CLASS_EVT(CSCHED, 2)
+#define TRC_CSCHED_ACCOUNT_STOP  TRC_SCHED_CLASS_EVT(CSCHED, 3)
+#define TRC_CSCHED_STOLEN_VCPU   TRC_SCHED_CLASS_EVT(CSCHED, 4)
+#define TRC_CSCHED_PICKED_CPU    TRC_SCHED_CLASS_EVT(CSCHED, 5)
+#define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
+
+
+/*
  * Boot parameters
  */
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
@@ -316,9 +329,18 @@ static inline void
         }
     }
 
-    /* Send scheduler interrupts to designated CPUs */
     if ( !cpumask_empty(&mask) )
+    {
+        if ( unlikely(tb_init_done) )
+        {
+            /* Avoid TRACE_*: saves checking !tb_init_done each step */
+            for_each_cpu(cpu, &mask)
+                __trace_var(TRC_CSCHED_TICKLE, 0, sizeof(cpu), &cpu);
+        }
+
+        /* Send scheduler interrupts to designated CPUs */
         cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ);
+    }
 }
 
 static void
@@ -555,6 +577,8 @@ static int
     if ( commit && spc )
        spc->idle_bias = cpu;
 
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
+
     return cpu;
 }
 
@@ -587,6 +611,9 @@ static inline void
         }
     }
 
+    TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
@@ -609,6 +636,9 @@ static inline void
     {
         list_del_init(&sdom->active_sdom_elem);
     }
+
+    TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
 }
 
 static void
@@ -1242,6 +1272,8 @@ csched_runq_steal(int peer_cpu, int cpu,
             if (__csched_vcpu_is_migrateable(vc, cpu))
             {
                 /* We got a candidate. Grab it! */
+                TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
+                         vc->domain->domain_id, vc->vcpu_id);
                 SCHED_VCPU_STAT_CRANK(speer, migrate_q);
                 SCHED_STAT_CRANK(migrate_queued);
                 WARN_ON(vc->is_urgent);
@@ -1402,6 +1434,7 @@ csched_schedule(
     /* Tasklet work (which runs in idle VCPU context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
+        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
         snext = CSCHED_VCPU(idle_vcpu[cpu]);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu)
  2012-12-17 22:28 ` [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu) Dario Faggioli
@ 2012-12-18 12:10   ` George Dunlap
  0 siblings, 0 replies; 15+ messages in thread
From: George Dunlap @ 2012-12-18 12:10 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell

On 17/12/12 22:28, Dario Faggioli wrote:
> To fetch `per_cpu(schedule_data,cpu).curr' in a more readable
> way. It's in sched-if.h as that is where `struct schedule_data'
> is declared.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Changes from v2:
> * This patch now contains both macro definition and usage, (and
>    has been moved to the top of the series), as suggested during
>    review.
> * The macro has been moved moved to sched-if.h, as requested
>    during review.
> * The macro has been renamed curr_on_cpu(), to match with the
>    `*curr' field in `struct schedule_data' to which it points.
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -228,7 +228,7 @@ static void burn_credits(struct csched_v
>       unsigned int credits;
>
>       /* Assert svc is current */
> -    ASSERT(svc==CSCHED_VCPU(per_cpu(schedule_data, svc->vcpu->processor).curr));
> +    ASSERT( svc == CSCHED_VCPU(curr_on_cpu(svc->vcpu->processor)) );
>
>       if ( (delta = now - svc->start_time) <= 0 )
>           return;
> @@ -246,8 +246,7 @@ DEFINE_PER_CPU(unsigned int, last_tickle
>   static inline void
>   __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>   {
> -    struct csched_vcpu * const cur =
> -        CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
> +    struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu));
>       struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
>       cpumask_t mask;
>
> @@ -371,7 +370,7 @@ csched_alloc_pdata(const struct schedule
>           per_cpu(schedule_data, cpu).sched_priv = spc;
>
>       /* Start off idling... */
> -    BUG_ON(!is_idle_vcpu(per_cpu(schedule_data, cpu).curr));
> +    BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu)));
>       cpumask_set_cpu(cpu, prv->idlers);
>
>       spin_unlock_irqrestore(&prv->lock, flags);
> @@ -709,7 +708,7 @@ csched_vcpu_sleep(const struct scheduler
>
>       BUG_ON( is_idle_vcpu(vc) );
>
> -    if ( per_cpu(schedule_data, vc->processor).curr == vc )
> +    if ( curr_on_cpu(vc->processor) == vc )
>           cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
>       else if ( __vcpu_on_runq(svc) )
>           __runq_remove(svc);
> @@ -723,7 +722,7 @@ csched_vcpu_wake(const struct scheduler
>
>       BUG_ON( is_idle_vcpu(vc) );
>
> -    if ( unlikely(per_cpu(schedule_data, cpu).curr == vc) )
> +    if ( unlikely(curr_on_cpu(cpu) == vc) )
>       {
>           SCHED_STAT_CRANK(vcpu_wake_running);
>           return;
> @@ -1192,7 +1191,7 @@ static struct csched_vcpu *
>   csched_runq_steal(int peer_cpu, int cpu, int pri)
>   {
>       const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu);
> -    const struct vcpu * const peer_vcpu = per_cpu(schedule_data, peer_cpu).curr;
> +    const struct vcpu * const peer_vcpu = curr_on_cpu(peer_cpu);
>       struct csched_vcpu *speer;
>       struct list_head *iter;
>       struct vcpu *vc;
> @@ -1480,7 +1479,7 @@ csched_dump_pcpu(const struct scheduler
>       printk("core=%s\n", cpustr);
>
>       /* current VCPU */
> -    svc = CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
> +    svc = CSCHED_VCPU(curr_on_cpu(cpu));
>       if ( svc )
>       {
>           printk("\trun: ");
> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h
> +++ b/xen/include/xen/sched-if.h
> @@ -41,6 +41,8 @@ struct schedule_data {
>       atomic_t            urgent_count;   /* how many urgent vcpus           */
>   };
>
> +#define curr_on_cpu(c)    (per_cpu(schedule_data, c).curr)
> +
>   DECLARE_PER_CPU(struct schedule_data, schedule_data);
>   DECLARE_PER_CPU(struct scheduler *, scheduler);
>   DECLARE_PER_CPU(struct cpupool *, cpupool);
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU
  2012-12-17 22:28 ` [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU Dario Faggioli
@ 2012-12-18 12:12   ` George Dunlap
  0 siblings, 0 replies; 15+ messages in thread
From: George Dunlap @ 2012-12-18 12:12 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell

On 17/12/12 22:28, Dario Faggioli wrote:
> In _csched_cpu_pick() we try to select the best possible CPU for
> running a VCPU, considering the characteristics of the underlying
> hardware (i.e., how many threads, core, sockets, and how busy they
> are). What we want is "the idle execution vehicle with the most
> idling neighbours in its grouping".
>
> In order to achieve it, we select a CPU from the VCPU's affinity,
> giving preference to its current processor if possible, as the basis
> for the comparison with all the other CPUs. Problem is, to discount
> the VCPU itself when computing this "idleness" (in an attempt to be
> fair wrt its current processor), we arbitrarily and unconditionally
> consider that selected CPU as idle, even when it is not the case,
> for instance:
>   1. If the CPU is not the one where the VCPU is running (perhaps due
>      to the affinity being changed);
>   2. The CPU is where the VCPU is running, but it has other VCPUs in
>      its runq, so it won't go idle even if the VCPU in question goes.
>
> This is exemplified in the trace below:
>
> ]  3.466115364 x|------|------| d10v1   22005(2:2:5) 3 [ a 1 8 ]
>     ... ... ...
>     3.466122856 x|------|------| d10v1 runstate_change d10v1 running->offline
>     3.466123046 x|------|------| d?v? runstate_change d32767v0 runnable->running
>     ... ... ...
> ]  3.466126887 x|------|------| d32767v0   28004(2:8:4) 3 [ a 1 8 ]
>
> 22005(...) line (the first line) means _csched_cpu_pick() was called on
> VCPU 1 of domain 10, while it is running on CPU 0, and it choose CPU 8,
> which is busy ('|'), even if there are plenty of idle CPUs. That is
> because, as a consequence of changing the VCPU affinity, CPU 8 was
> chosen as the basis for the comparison, and therefore considered idle
> (its bit gets unconditionally set in the bitmask representing the idle
> CPUs). 28004(...) line means the VCPU is woken up and queued on CPU 8's
> runq, where it waits for a context switch or a migration, in order to
> be able to execute.
>
> This change fixes things by only considering the "guessed" CPU idle if
> the VCPU in question is both running there and is its only runnable
> VCPU.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Changes from v2:
>   * Use `vc->processor' instead of curr_on_cpu() for determining whether
>     or not vc is current on cpu, as suggested during review.
>   * Fixed IS_IDLE_RUNQ() macro in case runq is empty.
>   * Ditched the variable renaming, as requested during review.
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -59,6 +59,9 @@
>   #define CSCHED_VCPU(_vcpu)  ((struct csched_vcpu *) (_vcpu)->sched_priv)
>   #define CSCHED_DOM(_dom)    ((struct csched_dom *) (_dom)->sched_priv)
>   #define RUNQ(_cpu)          (&(CSCHED_PCPU(_cpu)->runq))
> +/* Is the first element of _cpu's runq its idle vcpu? */
> +#define IS_RUNQ_IDLE(_cpu)  (list_empty(RUNQ(_cpu)) || \
> +                             is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>
>
>   /*
> @@ -478,9 +481,14 @@ static int
>        * distinct cores first and guarantees we don't do something stupid
>        * like run two VCPUs on co-hyperthreads while there are idle cores
>        * or sockets.
> +     *
> +     * Notice that, when computing the "idleness" of cpu, we may want to
> +     * discount vc. That is, iff vc is the currently running and the only
> +     * runnable vcpu on cpu, we add cpu to the idlers.
>        */
>       cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
> -    cpumask_set_cpu(cpu, &idlers);
> +    if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) )
> +        cpumask_set_cpu(cpu, &idlers);
>       cpumask_and(&cpus, &cpus, &idlers);
>       cpumask_clear_cpu(cpu, &cpus);
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5 of 5 v3] xen: sched_credit: add some tracing
  2012-12-17 22:29 ` [PATCH 5 of 5 v3] xen: sched_credit: add some tracing Dario Faggioli
@ 2012-12-18 12:15   ` George Dunlap
  0 siblings, 0 replies; 15+ messages in thread
From: George Dunlap @ 2012-12-18 12:15 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell

On 17/12/12 22:29, Dario Faggioli wrote:
> About tickling, and PCPU selection.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Changes from v2:
> * Call to `trace_var()' converted to `__trace_var()', as it originally
>    was (something got messed up while reworking this for v2.
>    Thanks George. :-) )
>
> Changes from v1:
>   * Dummy `struct d {}', accommodating `cpu' only, removed
>     in spite of the much more readable `trace_var(..., sizeof(cpu), &cpu)',
>     as suggested.
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -21,6 +21,7 @@
>   #include <asm/atomic.h>
>   #include <xen/errno.h>
>   #include <xen/keyhandler.h>
> +#include <xen/trace.h>
>
>
>   /*
> @@ -98,6 +99,18 @@
>
>
>   /*
> + * Credit tracing events ("only" 512 available!). Check
> + * include/public/trace.h for more details.
> + */
> +#define TRC_CSCHED_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED, 1)
> +#define TRC_CSCHED_ACCOUNT_START TRC_SCHED_CLASS_EVT(CSCHED, 2)
> +#define TRC_CSCHED_ACCOUNT_STOP  TRC_SCHED_CLASS_EVT(CSCHED, 3)
> +#define TRC_CSCHED_STOLEN_VCPU   TRC_SCHED_CLASS_EVT(CSCHED, 4)
> +#define TRC_CSCHED_PICKED_CPU    TRC_SCHED_CLASS_EVT(CSCHED, 5)
> +#define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
> +
> +
> +/*
>    * Boot parameters
>    */
>   static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
> @@ -316,9 +329,18 @@ static inline void
>           }
>       }
>
> -    /* Send scheduler interrupts to designated CPUs */
>       if ( !cpumask_empty(&mask) )
> +    {
> +        if ( unlikely(tb_init_done) )
> +        {
> +            /* Avoid TRACE_*: saves checking !tb_init_done each step */
> +            for_each_cpu(cpu, &mask)
> +                __trace_var(TRC_CSCHED_TICKLE, 0, sizeof(cpu), &cpu);
> +        }
> +
> +        /* Send scheduler interrupts to designated CPUs */
>           cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ);
> +    }
>   }
>
>   static void
> @@ -555,6 +577,8 @@ static int
>       if ( commit && spc )
>          spc->idle_bias = cpu;
>
> +    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
> +
>       return cpu;
>   }
>
> @@ -587,6 +611,9 @@ static inline void
>           }
>       }
>
> +    TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
> +             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
> +
>       spin_unlock_irqrestore(&prv->lock, flags);
>   }
>
> @@ -609,6 +636,9 @@ static inline void
>       {
>           list_del_init(&sdom->active_sdom_elem);
>       }
> +
> +    TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
> +             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
>   }
>
>   static void
> @@ -1242,6 +1272,8 @@ csched_runq_steal(int peer_cpu, int cpu,
>               if (__csched_vcpu_is_migrateable(vc, cpu))
>               {
>                   /* We got a candidate. Grab it! */
> +                TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
> +                         vc->domain->domain_id, vc->vcpu_id);
>                   SCHED_VCPU_STAT_CRANK(speer, migrate_q);
>                   SCHED_STAT_CRANK(migrate_queued);
>                   WARN_ON(vc->is_urgent);
> @@ -1402,6 +1434,7 @@ csched_schedule(
>       /* Tasklet work (which runs in idle VCPU context) overrides all else. */
>       if ( tasklet_work_scheduled )
>       {
> +        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
>           snext = CSCHED_VCPU(idle_vcpu[cpu]);
>           snext->pri = CSCHED_PRI_TS_BOOST;
>       }
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing
  2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
                   ` (4 preceding siblings ...)
  2012-12-17 22:29 ` [PATCH 5 of 5 v3] xen: sched_credit: add some tracing Dario Faggioli
@ 2012-12-18 12:16 ` George Dunlap
  2012-12-18 14:16   ` Dario Faggioli
  5 siblings, 1 reply; 15+ messages in thread
From: George Dunlap @ 2012-12-18 12:16 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell

On 17/12/12 22:28, Dario Faggioli wrote:
> Hello,
>
> Here's the take 3 of this series (last round here:
>   http://comments.gmane.org/gmane.comp.emulators.xen.devel/145998).
>
> Super quickly, this is about fixing a couple of anomalies in the  credit
> scheduler and adding some tracing to it.  All the comments raised during v2's
> review have been addressed.
>
> Quick summary of the series (* = Acked):
>
>     1/5 xen: sched_credit: define and use curr_on_cpu(cpu)
>     2/5 xen: sched_credit: improve picking up the idle CPU for a VCPU
>   * 3/5 xen: sched_credit: improve tickling of idle CPUs
>   * 4/5 xen: tracing: introduce per-scheduler trace event IDs
>     5/5 xen: sched_credit: add some tracing

Keir, Jan: All of the patches have Acks from me.

  -George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing
  2012-12-18 12:16 ` [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and " George Dunlap
@ 2012-12-18 14:16   ` Dario Faggioli
  0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2012-12-18 14:16 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 768 bytes --]

On Tue, 2012-12-18 at 12:16 +0000, George Dunlap wrote:
> >     1/5 xen: sched_credit: define and use curr_on_cpu(cpu)
> >     2/5 xen: sched_credit: improve picking up the idle CPU for a VCPU
> >   * 3/5 xen: sched_credit: improve tickling of idle CPUs
> >   * 4/5 xen: tracing: introduce per-scheduler trace event IDs
> >     5/5 xen: sched_credit: add some tracing
> 
> Keir, Jan: All of the patches have Acks from me.
> 
Nice! Thanks George for looking at this (again)! :-P

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-17 22:29 ` [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
@ 2013-03-08 16:08   ` George Dunlap
  2013-03-09 10:40     ` Dario Faggioli
  0 siblings, 1 reply; 15+ messages in thread
From: George Dunlap @ 2013-03-08 16:08 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: xen-devel@lists.xensource.com, Keir Fraser, George Dunlap,
	Ian Campbell

On Mon, Dec 17, 2012 at 10:29 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> So that it becomes possible to create scheduler specific trace records,
> within each scheduler, without worrying about the overlapping, and also
> without giving up being able to recognise them univocally. The latter
> is deemed as useful, since we can have more than one scheduler running
> at the same time, thanks to cpupools.
>
> The event ID is 12 bits, and this change uses the upper 3 of them for
> the 'scheduler ID'. This means we're limited to 8 schedulers and to
> 512 scheduler specific tracing events. Both seem reasonable limitations
> as of now.
>
> This also converts the existing credit2 tracing (the only scheduler
> generating tracing events up to now) to the new system.

Dario, do you have xenalyze patches for this changeset somewhere?

 -George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs
  2013-03-08 16:08   ` George Dunlap
@ 2013-03-09 10:40     ` Dario Faggioli
  2013-03-11 10:57       ` George Dunlap
  0 siblings, 1 reply; 15+ messages in thread
From: Dario Faggioli @ 2013-03-09 10:40 UTC (permalink / raw)
  To: George Dunlap
  Cc: George Dunlap, xen-devel@lists.xensource.com, Keir (Xen.org),
	Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1514 bytes --]

On ven, 2013-03-08 at 16:08 +0000, George Dunlap wrote:
> On Mon, Dec 17, 2012 at 10:29 PM, Dario Faggioli
> > This also converts the existing credit2 tracing (the only scheduler
> > generating tracing events up to now) to the new system.
> 
> Dario, do you have xenalyze patches for this changeset somewhere?
> 
TBH, I haven't done any xenalyze modification, corresponding to this.
However, I'm not really sure it's necessary.

Well, let me put it this way, I run xenalyze on traces collected with
this change, and everything works, and the scheduler-id is shown in the
event. OF course we could improve the situation by decoding it in
xenalyze and showing it explicitly in the trace analysis. I (if this is
what you were referring to) agree it would be nice, but again, it's not
something I've done yet, and not something I can commit on doing in the
next days either. :-(

I don't think it's terrible, as the change only affect traces where the
actual event is also reported encoded, so you have to go check the
headers (or remember stuff) anyway. Again, I completely agree it would
be nice to limit the amount of stuff one has to remember, and I'm up for
it, just not immediately... Is that fine/enough?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs
  2013-03-09 10:40     ` Dario Faggioli
@ 2013-03-11 10:57       ` George Dunlap
  2013-03-11 19:10         ` Dario Faggioli
  0 siblings, 1 reply; 15+ messages in thread
From: George Dunlap @ 2013-03-11 10:57 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, xen-devel@lists.xensource.com, Keir (Xen.org),
	Ian Campbell

On 09/03/13 10:40, Dario Faggioli wrote:
> On ven, 2013-03-08 at 16:08 +0000, George Dunlap wrote:
>> On Mon, Dec 17, 2012 at 10:29 PM, Dario Faggioli
>>> This also converts the existing credit2 tracing (the only scheduler
>>> generating tracing events up to now) to the new system.
>> Dario, do you have xenalyze patches for this changeset somewhere?
>>
> TBH, I haven't done any xenalyze modification, corresponding to this.
> However, I'm not really sure it's necessary.

It does change the trace records slightly, by adding in the extra 
scheduler ID at the top.  It just requires propagating the changes in 
trace.h from the xen tree to xenalyze -- not hard to do; I jut wanted to 
know if you happened to have the patch lying around so I wouldn't have 
to go through and do it.  No worries. :-)

  -George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs
  2013-03-11 10:57       ` George Dunlap
@ 2013-03-11 19:10         ` Dario Faggioli
  0 siblings, 0 replies; 15+ messages in thread
From: Dario Faggioli @ 2013-03-11 19:10 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xensource.com, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 1233 bytes --]

On lun, 2013-03-11 at 10:57 +0000, George Dunlap wrote:
> On 09/03/13 10:40, Dario Faggioli wrote:
> > TBH, I haven't done any xenalyze modification, corresponding to this.
> > However, I'm not really sure it's necessary.
> 
> It does change the trace records slightly, by adding in the extra 
> scheduler ID at the top.  It just requires propagating the changes in 
> trace.h from the xen tree to xenalyze -- not hard to do; 
>
Oh, I see... Then I guess I just failed to spot anything like that was
needed, perhaps because everything was JustWorking^TM, at least to my
inexperienced eyes. :-)

> I jut wanted to 
> know if you happened to have the patch lying around so I wouldn't have 
> to go through and do it.  No worries. :-)
> 
Ok, so, I think we should go for a "whoever gets to it first tell it to
the other (e.g., by Cc-ing him to the patch ;-P), to avoid duplicating
the (although small) effort" policy... Is that fine with you?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-03-11 19:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 22:28 [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and add some tracing Dario Faggioli
2012-12-17 22:28 ` [PATCH 1 of 5 v3] xen: sched_credit: define and use curr_on_cpu(cpu) Dario Faggioli
2012-12-18 12:10   ` George Dunlap
2012-12-17 22:28 ` [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU Dario Faggioli
2012-12-18 12:12   ` George Dunlap
2012-12-17 22:29 ` [PATCH 3 of 5 v3] xen: sched_credit: improve tickling of idle CPUs Dario Faggioli
2012-12-17 22:29 ` [PATCH 4 of 5 v3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2013-03-08 16:08   ` George Dunlap
2013-03-09 10:40     ` Dario Faggioli
2013-03-11 10:57       ` George Dunlap
2013-03-11 19:10         ` Dario Faggioli
2012-12-17 22:29 ` [PATCH 5 of 5 v3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-18 12:15   ` George Dunlap
2012-12-18 12:16 ` [PATCH 0 of 5 v3] xen: sched_credit: fix picking and tickling and " George Dunlap
2012-12-18 14:16   ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).