xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing
@ 2012-12-03 16:34 Dario Faggioli
  2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-03 16:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap

Hello,

This small series deals with some weirdness in the mechanism with which the
credit scheduler choses what PCPU to tickle upon a VCPU wake-up.  Details are
available in the changelog of the first patch.

The new approach has been extensively benchmarked and proved itself either
beneficial or harmless. That means it does not introduce any significant amount
of overhead and/or performances regressions while, for some workloads, it
improves the performances quite sensibly (e.g., `sysbench --test=memory').

Full results in the first changelog too.

The rest of the series introduces some macros to enable generating
per-scheduler tracing events, retaining the possibility of distinguishing them,
even with more than one scheduler running at any given time (via cpupools), and
adds some tracing to the credit scheduler.

Thanks and Regards, Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
  2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
@ 2012-12-03 16:34 ` Dario Faggioli
  2012-12-03 17:12   ` Ian Campbell
  2012-12-05 12:16   ` George Dunlap
  2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
  2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
  2 siblings, 2 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-03 16:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap

Right now, when a VCPU wakes-up, we check if the it should preempt
what is running on the PCPU, and whether or not the waking VCPU can
be migrated (by tickling some idlers). However, this can result in
suboptimal or even wrong behaviour, as explained here:

 http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html

This change, instead, when deciding what PCPUs to tickle upon VCPU
wake-up, considers both what it is likely to happen on the PCPU
where the wakeup occurs, as well as whether or not there are idle
PCPUs where to run the waking VCPU.
In fact, if there are idlers where the new VCPU can run, we can
avoid interrupting the running VCPU. OTOH, in case there aren't
any of these PCPUs, preemption and migration are the way to go.

This has been tested by running the following benchmarks inside 2,
6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
960 MB of memory (host has 16 ways and 12 GB RAM).

1) All VMs had 'cpus="all"' in their config file.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
 | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
 | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
 | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
 | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
$ specjbb2005 ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
 | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
 | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |


2) All VMs had their VCPUs statically pinned to the host's PCPUs.

$ sysbench --test=cpu ... (time, lower is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
 | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
 | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
$ sysbench --test=memory ... (throughput, higher is better)
 | VMs | w/o this change         | w/ this change          |
 | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
 | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
 | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
$ specjbb2005 ... (throughput, higher is better)
 | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
 | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
 | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |


Numbers show how the change has either no or very limited impact
(specjbb2005 case) or, when it does have some impact, that is an
actual improvement in performances, especially in the
sysbench-memory case.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,13 +249,25 @@ static inline void
     struct csched_vcpu * const cur =
         CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    /* Check whether or not there are idlers that can run new */
+    cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+
+    /*
+     * Should we ask cpu to reschedule? Well, if new can preempt cur,
+     * and there isn't any other place where it can run, we do. OTOH,
+     * if there are idlers where new can run, we can avoid interrupting
+     * cur and ask them to come and pick new up. So no, in that case, we
+     * do not signal cpu, avoiding an unnecessary migration of a running
+     * VCPU. It is true that we are (probably) migrating new, but as it
+     * is waking up, it likely is cache-cold anyway.
+     */
+    if ( new->pri > cur->pri &&
+         (cur->pri == CSCHED_PRI_IDLE || cpumask_empty(&idle_mask)) )
     {
         if ( cur->pri == CSCHED_PRI_IDLE )
             SCHED_STAT_CRANK(tickle_local_idler);
@@ -270,7 +282,7 @@ static inline void
     }
 
     /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
+     * If this CPU has at least two runnable VCPUs, we tickle some idlers to
      * let them know there is runnable work in the system...
      */
     if ( cur->pri > CSCHED_PRI_IDLE )
@@ -281,9 +293,16 @@ static inline void
         }
         else
         {
-            cpumask_t idle_mask;
 
-            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
+            /*
+             * If there aren't idlers for new, then letting new preempt cur and
+             * try to migrate cur becomes ineviable. If that is the case, update
+             * the mask of the to-be-tickled CPUs accordingly (i.e., with cur's
+             * idlers instead of new's).
+             */
+            if ( new->pri > cur->pri && cpumask_empty(&idle_mask) )
+                cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
+
             if ( !cpumask_empty(&idle_mask) )
             {
                 SCHED_STAT_CRANK(tickle_idlers_some);
@@ -296,7 +315,6 @@ static inline void
                 else
                     cpumask_or(&mask, &mask, &idle_mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
         }
     }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
  2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
@ 2012-12-03 16:34 ` Dario Faggioli
  2012-12-04 18:53   ` George Dunlap
  2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
  2 siblings, 1 reply; 18+ messages in thread
From: Dario Faggioli @ 2012-12-03 16:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap

So that it becomes possible to create specific scheduling event
within each scheduler without worrying about the overlapping,
and also without giving up being able to recognise them
univocally. The latter is deemed as useful, since we can have
more than one scheduler running at the same time, thanks to
cpupools.

The event ID is 12 bits, and this change uses the upper 3 of them
for the 'scheduler ID'. This means we're limited to 8 schedulers
and to 512 scheduler specific tracing events. Both seem reasonable
limitations as of now.

This also converts the existing credit2 tracing (the only scheduler
generating tracing events up to now) to the new system.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -29,18 +29,24 @@
 #define d2printk(x...)
 //#define d2printk printk
 
-#define TRC_CSCHED2_TICK        TRC_SCHED_CLASS + 1
-#define TRC_CSCHED2_RUNQ_POS    TRC_SCHED_CLASS + 2
-#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3
-#define TRC_CSCHED2_CREDIT_ADD  TRC_SCHED_CLASS + 4
-#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5
-#define TRC_CSCHED2_TICKLE       TRC_SCHED_CLASS + 6
-#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7
-#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8
-#define TRC_CSCHED2_UPDATE_LOAD   TRC_SCHED_CLASS + 9
-#define TRC_CSCHED2_RUNQ_ASSIGN   TRC_SCHED_CLASS + 10
-#define TRC_CSCHED2_UPDATE_VCPU_LOAD   TRC_SCHED_CLASS + 11
-#define TRC_CSCHED2_UPDATE_RUNQ_LOAD   TRC_SCHED_CLASS + 12
+/*
+ * Credit2 tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2) + _e)
+
+#define TRC_CSCHED2_TICK             (TRC_CSCHED2_EVENT(1))
+#define TRC_CSCHED2_RUNQ_POS         (TRC_CSCHED2_EVENT(2))
+#define TRC_CSCHED2_CREDIT_BURN      (TRC_CSCHED2_EVENT(3))
+#define TRC_CSCHED2_CREDIT_ADD       (TRC_CSCHED2_EVENT(4))
+#define TRC_CSCHED2_TICKLE_CHECK     (TRC_CSCHED2_EVENT(5))
+#define TRC_CSCHED2_TICKLE           (TRC_CSCHED2_EVENT(6))
+#define TRC_CSCHED2_CREDIT_RESET     (TRC_CSCHED2_EVENT(7))
+#define TRC_CSCHED2_SCHED_TASKLET    (TRC_CSCHED2_EVENT(8))
+#define TRC_CSCHED2_UPDATE_LOAD      (TRC_CSCHED2_EVENT(9))
+#define TRC_CSCHED2_RUNQ_ASSIGN      (TRC_CSCHED2_EVENT(10))
+#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11))
+#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12))
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be found at the
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -57,6 +57,26 @@
 #define TRC_SCHED_CLASS     0x00022000   /* Scheduler-specific    */
 #define TRC_SCHED_VERBOSE   0x00028000   /* More inclusive scheduling */
 
+/*
+ * Per-scheduler masks, to identify scheduler specific events.
+ *
+ * The highest 3 bits of the last 12 bits of the above TRC_SCHED_*
+ * tracing classes are reserved for encoding what scheduler is producing
+ * the event. The last 9 bits is where the scheduler specific event is
+ * encoded.
+ *
+ * This means we can have at most 8 tracing scheduling masks (which
+ * means at most 8 schedulers generating tracing events) and, in each
+ * scheduler, up to 512 different events.
+ */
+#define TRC_SCHED_ID_BITS    3
+#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS)
+
+#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
+#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT)
+
 /* Trace classes for Hardware */
 #define TRC_HW_PM           0x00801000   /* Power management traces */
 #define TRC_HW_IRQ          0x00802000   /* Traces relating to the handling of IRQs */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
  2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
  2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
@ 2012-12-03 16:35 ` Dario Faggioli
  2012-12-04 19:10   ` George Dunlap
  2 siblings, 1 reply; 18+ messages in thread
From: Dario Faggioli @ 2012-12-03 16:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, George Dunlap

About tickling, and PCPU selection.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -21,6 +21,7 @@
 #include <asm/atomic.h>
 #include <xen/errno.h>
 #include <xen/keyhandler.h>
+#include <xen/trace.h>
 
 
 /*
@@ -95,6 +96,20 @@
 
 
 /*
+ * Credit tracing events ("only" 512 available!). Check
+ * include/public/trace.h for more details.
+ */
+#define TRC_CSCHED_EVENT(_e)     ((TRC_SCHED_CLASS|TRC_MASK_CSCHED) + _e)
+
+#define TRC_CSCHED_SCHED_TASKLET TRC_CSCHED_EVENT(1)
+#define TRC_CSCHED_ACCOUNT_START TRC_CSCHED_EVENT(2)
+#define TRC_CSCHED_ACCOUNT_STOP  TRC_CSCHED_EVENT(3)
+#define TRC_CSCHED_STOLEN_VCPU   TRC_CSCHED_EVENT(4)
+#define TRC_CSCHED_PICKED_CPU    TRC_CSCHED_EVENT(5)
+#define TRC_CSCHED_TICKLE        TRC_CSCHED_EVENT(6)
+
+
+/*
  * Boot parameters
  */
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
@@ -318,9 +333,26 @@ static inline void
         }
     }
 
-    /* Send scheduler interrupts to designated CPUs */
     if ( !cpumask_empty(&mask) )
+    {
+        if ( unlikely(tb_init_done) )
+        {
+            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
+            for_each_cpu(cpu, &mask)
+            {
+                struct {
+                    unsigned cpu:8;
+                } d;
+                d.cpu = cpu;
+                trace_var(TRC_CSCHED_TICKLE, 0,
+                          sizeof(d),
+                          (unsigned char*)&d);
+            }
+        }
+
+        /* Send scheduler interrupts to designated CPUs */
         cpumask_raise_softirq(&mask, SCHEDULE_SOFTIRQ);
+    }
 }
 
 static void
@@ -552,6 +584,8 @@ static int
     if ( commit && spc )
        spc->idle_bias = cpu;
 
+    TRACE_3D(TRC_CSCHED_PICKED_CPU, vc->domain->domain_id, vc->vcpu_id, cpu);
+
     return cpu;
 }
 
@@ -584,6 +618,9 @@ static inline void
         }
     }
 
+    TRACE_3D(TRC_CSCHED_ACCOUNT_START, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
+
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
@@ -606,6 +643,9 @@ static inline void
     {
         list_del_init(&sdom->active_sdom_elem);
     }
+
+    TRACE_3D(TRC_CSCHED_ACCOUNT_STOP, sdom->dom->domain_id,
+             svc->vcpu->vcpu_id, sdom->active_vcpu_count);
 }
 
 static void
@@ -1238,6 +1278,8 @@ csched_runq_steal(int peer_cpu, int cpu,
             if (__csched_vcpu_is_migrateable(vc, cpu))
             {
                 /* We got a candidate. Grab it! */
+                TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
+                         vc->domain->domain_id, vc->vcpu_id);
                 SCHED_VCPU_STAT_CRANK(speer, migrate_q);
                 SCHED_STAT_CRANK(migrate_queued);
                 WARN_ON(vc->is_urgent);
@@ -1398,6 +1440,7 @@ csched_schedule(
     /* Tasklet work (which runs in idle VCPU context) overrides all else. */
     if ( tasklet_work_scheduled )
     {
+        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
         snext = CSCHED_VCPU(idle_vcpu[cpu]);
         snext->pri = CSCHED_PRI_TS_BOOST;
     }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
  2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
@ 2012-12-03 17:12   ` Ian Campbell
  2012-12-03 18:26     ` Dario Faggioli
  2012-12-05 12:16   ` George Dunlap
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2012-12-03 17:12 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Keir (Xen.org)

On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote:
> +            /*
> +             * If there aren't idlers for new, then letting new preempt cur and
> +             * try to migrate cur becomes ineviable. If that is the case, update

"inevitable"

> +             * the mask of the to-be-tickled CPUs accordingly (i.e., with cur's
> +             * idlers instead of new's).
> +             */
> +            if ( new->pri > cur->pri && cpumask_empty(&idle_mask) )
> +                cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
> +

(I have nothing more intelligent to say...)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
  2012-12-03 17:12   ` Ian Campbell
@ 2012-12-03 18:26     ` Dario Faggioli
  0 siblings, 0 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-03 18:26 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel, Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1125 bytes --]

On Mon, 2012-12-03 at 17:12 +0000, Ian Campbell wrote:
> On Mon, 2012-12-03 at 16:34 +0000, Dario Faggioli wrote:
> > +            /*
> > +             * If there aren't idlers for new, then letting new preempt cur and
> > +             * try to migrate cur becomes ineviable. If that is the case, update
> 
> "inevitable"
> 
Gha! Again! :-(

I really need that 'comment spellchecker' thing! Actually, looking for
it a little bit more thoroughly, I ran into this (which I didn't know): 

http://vimdoc.sourceforge.net/htmldoc/spell.html

As well as into the fact that just doing a ':set spell' in a recent
enough version of Vim, while editing a C file, seems to do the job quite
nicely.

> (I have nothing more intelligent to say...)
> 
Well, it really was useful to me, as it triggered the above! :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
@ 2012-12-04 18:53   ` George Dunlap
  2012-12-04 18:55     ` George Dunlap
  2012-12-05 11:57     ` Dario Faggioli
  0 siblings, 2 replies; 18+ messages in thread
From: George Dunlap @ 2012-12-04 18:53 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Keir Fraser, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 4548 bytes --]

On Mon, Dec 3, 2012 at 4:34 PM, Dario Faggioli <raistlin@linux.it> wrote:

> So that it becomes possible to create specific scheduling event
> within each scheduler without worrying about the overlapping,
> and also without giving up being able to recognise them
> univocally. The latter is deemed as useful, since we can have
> more than one scheduler running at the same time, thanks to
> cpupools.
>
> The event ID is 12 bits, and this change uses the upper 3 of them
> for the 'scheduler ID'. This means we're limited to 8 schedulers
> and to 512 scheduler specific tracing events. Both seem reasonable
> limitations as of now.
>
> This also converts the existing credit2 tracing (the only scheduler
> generating tracing events up to now) to the new system.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>




>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -29,18 +29,24 @@
>  #define d2printk(x...)
>  //#define d2printk printk
>
> -#define TRC_CSCHED2_TICK        TRC_SCHED_CLASS + 1
> -#define TRC_CSCHED2_RUNQ_POS    TRC_SCHED_CLASS + 2
> -#define TRC_CSCHED2_CREDIT_BURN TRC_SCHED_CLASS + 3
> -#define TRC_CSCHED2_CREDIT_ADD  TRC_SCHED_CLASS + 4
> -#define TRC_CSCHED2_TICKLE_CHECK TRC_SCHED_CLASS + 5
> -#define TRC_CSCHED2_TICKLE       TRC_SCHED_CLASS + 6
> -#define TRC_CSCHED2_CREDIT_RESET TRC_SCHED_CLASS + 7
> -#define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS + 8
> -#define TRC_CSCHED2_UPDATE_LOAD   TRC_SCHED_CLASS + 9
> -#define TRC_CSCHED2_RUNQ_ASSIGN   TRC_SCHED_CLASS + 10
> -#define TRC_CSCHED2_UPDATE_VCPU_LOAD   TRC_SCHED_CLASS + 11
> -#define TRC_CSCHED2_UPDATE_RUNQ_LOAD   TRC_SCHED_CLASS + 12
> +/*
> + * Credit2 tracing events ("only" 512 available!). Check
> + * include/public/trace.h for more details.
> + */
> +#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|TRC_MASK_CSCHED2)
> + _e)
>

I think I would make this generic, and put it in trace.h -- maybe something
like this?  (Haven't run this through a compiler)

#define TRC_SCHED_CLASS_EVT(_c, _e) \

((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK))

Then these would look like:
#define TRC_CSCHED2_TICK TRC_SCHED_CLASS_EVT(CSCHED2,1)


> +
> +#define TRC_CSCHED2_TICK             (TRC_CSCHED2_EVENT(1))
> +#define TRC_CSCHED2_RUNQ_POS         (TRC_CSCHED2_EVENT(2))
> +#define TRC_CSCHED2_CREDIT_BURN      (TRC_CSCHED2_EVENT(3))
> +#define TRC_CSCHED2_CREDIT_ADD       (TRC_CSCHED2_EVENT(4))
> +#define TRC_CSCHED2_TICKLE_CHECK     (TRC_CSCHED2_EVENT(5))
> +#define TRC_CSCHED2_TICKLE           (TRC_CSCHED2_EVENT(6))
> +#define TRC_CSCHED2_CREDIT_RESET     (TRC_CSCHED2_EVENT(7))
> +#define TRC_CSCHED2_SCHED_TASKLET    (TRC_CSCHED2_EVENT(8))
> +#define TRC_CSCHED2_UPDATE_LOAD      (TRC_CSCHED2_EVENT(9))
> +#define TRC_CSCHED2_RUNQ_ASSIGN      (TRC_CSCHED2_EVENT(10))
> +#define TRC_CSCHED2_UPDATE_VCPU_LOAD (TRC_CSCHED2_EVENT(11))
> +#define TRC_CSCHED2_UPDATE_RUNQ_LOAD (TRC_CSCHED2_EVENT(12))
>
>  /*
>   * WARNING: This is still in an experimental phase.  Status and work can
> be found at the
> diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
> --- a/xen/include/public/trace.h
> +++ b/xen/include/public/trace.h
> @@ -57,6 +57,26 @@
>  #define TRC_SCHED_CLASS     0x00022000   /* Scheduler-specific    */
>  #define TRC_SCHED_VERBOSE   0x00028000   /* More inclusive scheduling */
>
> +/*
> + * Per-scheduler masks, to identify scheduler specific events.
> + *
> + * The highest 3 bits of the last 12 bits of the above TRC_SCHED_*
> + * tracing classes are reserved for encoding what scheduler is producing
> + * the event. The last 9 bits is where the scheduler specific event is
> + * encoded.
> + *
> + * This means we can have at most 8 tracing scheduling masks (which
> + * means at most 8 schedulers generating tracing events) and, in each
> + * scheduler, up to 512 different events.
> + */
> +#define TRC_SCHED_ID_BITS    3
> +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT - TRC_SCHED_ID_BITS)
> +
> +#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
> +#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT)
>

I don't think "mask" is right here -- these aren't masks, they're numerical
values. :-)  If we use something like the #define above, then we can do:

#define TRC_SCHED_CSCHED 0
#define TRC_SCHED_CSCHED2
/*...*/

 -George

[-- Attachment #1.2: Type: text/html, Size: 5523 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-04 18:53   ` George Dunlap
@ 2012-12-04 18:55     ` George Dunlap
  2012-12-05 11:57     ` Dario Faggioli
  1 sibling, 0 replies; 18+ messages in thread
From: George Dunlap @ 2012-12-04 18:55 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Keir Fraser, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 304 bytes --]

On Tue, Dec 4, 2012 at 6:53 PM, George Dunlap
<George.Dunlap@eu.citrix.com>wrote:

>
> #define TRC_SCHED_CSCHED 0
> #define TRC_SCHED_CSCHED2
> /*...*/
>

No idea what random key combination I pushed to make gmail just sent that
e-mail... anyway, that should be a "1" after CSCHED2.

Thoughts?

 -George

[-- Attachment #1.2: Type: text/html, Size: 758 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
@ 2012-12-04 19:10   ` George Dunlap
  2012-12-05 11:54     ` Dario Faggioli
  0 siblings, 1 reply; 18+ messages in thread
From: George Dunlap @ 2012-12-04 19:10 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Keir (Xen.org)

On 03/12/12 16:35, Dario Faggioli wrote:
> +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
> +            for_each_cpu(cpu, &mask)
> +            {
> +                struct {
> +                    unsigned cpu:8;
> +                } d;
> +                d.cpu = cpu;
> +                trace_var(TRC_CSCHED_TICKLE, 0,
> +                          sizeof(d),
> +                          (unsigned char*)&d);

Why not just TRC_1D()?  The tracing infrastructure can only set the size 
at a granularity of 32-bit words anyway, and at this point cpu is 
"unsigned int", which will be a single word.

Other than that, everything looks good.

  -George

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 11:54     ` Dario Faggioli
@ 2012-12-05 11:51       ` George Dunlap
  2012-12-05 12:01       ` Ian Campbell
  1 sibling, 0 replies; 18+ messages in thread
From: George Dunlap @ 2012-12-05 11:51 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Keir (Xen.org)

On 05/12/12 11:54, Dario Faggioli wrote:
> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote:
>> On 03/12/12 16:35, Dario Faggioli wrote:
>>> +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
>>> +            for_each_cpu(cpu, &mask)
>>> +            {
>>> +                struct {
>>> +                    unsigned cpu:8;
>>> +                } d;
>>> +                d.cpu = cpu;
>>> +                trace_var(TRC_CSCHED_TICKLE, 0,
>>> +                          sizeof(d),
>>> +                          (unsigned char*)&d);
>> Why not just TRC_1D()?
>>
> As I tried to explain in the comment, I just wanted to avoid checking
> for !tb_init_done more than once, as this happens within a loop and, at
> least potentially, there may be more CPUs to tickle (and thus more calls
> to TRACE_1D). I take this comment of yours as you not thinking that is
> something worthwhile, right? If so, I can definitely turn this into a
> "standard" TRACE_1D() call.

Oh right -- yeah, no sense in having a duplicate check on tb_init_done; 
but the struct is still pointless; just passing sizeof(cpu) and &cpu 
should be prettier (even if the complier will probably optimize it to 
the same thing).

  -George

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-04 19:10   ` George Dunlap
@ 2012-12-05 11:54     ` Dario Faggioli
  2012-12-05 11:51       ` George Dunlap
  2012-12-05 12:01       ` Ian Campbell
  0 siblings, 2 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-05 11:54 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1686 bytes --]

On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: 
> On 03/12/12 16:35, Dario Faggioli wrote:
> > +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
> > +            for_each_cpu(cpu, &mask)
> > +            {
> > +                struct {
> > +                    unsigned cpu:8;
> > +                } d;
> > +                d.cpu = cpu;
> > +                trace_var(TRC_CSCHED_TICKLE, 0,
> > +                          sizeof(d),
> > +                          (unsigned char*)&d);
> 
> Why not just TRC_1D()? 
>
As I tried to explain in the comment, I just wanted to avoid checking
for !tb_init_done more than once, as this happens within a loop and, at
least potentially, there may be more CPUs to tickle (and thus more calls
to TRACE_1D). I take this comment of yours as you not thinking that is
something worthwhile, right? If so, I can definitely turn this into a
"standard" TRACE_1D() call.

> The tracing infrastructure can only set the size 
> at a granularity of 32-bit words anyway, and at this point cpu is 
> "unsigned int", which will be a single word.
> 
I know that. I just followed suit from sched_credit2.c, but I agree it's
quite pointless for just one single field. Even if we decide to leave
the direct call to trace_var, I'll kill the dummy struct.

> Other than that, everything looks good.
> 
Ok, thanks. :-)
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs
  2012-12-04 18:53   ` George Dunlap
  2012-12-04 18:55     ` George Dunlap
@ 2012-12-05 11:57     ` Dario Faggioli
  1 sibling, 0 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-05 11:57 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1665 bytes --]

On Tue, 2012-12-04 at 18:53 +0000, George Dunlap wrote: 
>         
>         +/*
>         + * Credit2 tracing events ("only" 512 available!). Check
>         + * include/public/trace.h for more details.
>         + */
>         +#define TRC_CSCHED2_EVENT(_e)        ((TRC_SCHED_CLASS|
>         TRC_MASK_CSCHED2) + _e) 
> 
> I think I would make this generic, and put it in trace.h 
>
Sounds good. Will do.

> -- maybe something like this?  (Haven't run this through a compiler)
> 
> #define TRC_SCHED_CLASS_EVT(_c, _e) \
> 
> ((TRC_SCHED_CLASS|(TRC_SCHED_##_c<<TRC_SCHED_MASK_SHIFT))+(_e&TRC_SCHED_CLASS_MASK))
> 
I'll try it and resend. 
>         +#define TRC_SCHED_ID_BITS    3
>         +#define TRC_SCHED_MASK_SHIFT (TRC_SUBCLS_SHIFT -
>         TRC_SCHED_ID_BITS)
>         +
>         +#define TRC_MASK_CSCHED      (0 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_CSCHED2     (1 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_SEDF        (2 << TRC_SCHED_MASK_SHIFT)
>         +#define TRC_MASK_ARINC653    (3 << TRC_SCHED_MASK_SHIFT) 
> 
> I don't think "mask" is right here -- these aren't masks, they're
> numerical values. :-)  If we use something like the #define above,
> then we can do:
> 
> #define TRC_SCHED_CSCHED 0
> #define TRC_SCHED_CSCHED2
> /*...*/
> 
I agree, bad name. (re "mask"). :-)

Thanks and Regards,
Dario 
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 11:54     ` Dario Faggioli
  2012-12-05 11:51       ` George Dunlap
@ 2012-12-05 12:01       ` Ian Campbell
  2012-12-05 12:15         ` Dario Faggioli
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2012-12-05 12:01 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Keir (Xen.org)

On Wed, 2012-12-05 at 11:54 +0000, Dario Faggioli wrote:
> On Tue, 2012-12-04 at 19:10 +0000, George Dunlap wrote: 
> > On 03/12/12 16:35, Dario Faggioli wrote:
> > > +            /* Avoid TRACE_* to avoid a lot of useless !tb_init_done checks */
> > > +            for_each_cpu(cpu, &mask)
> > > +            {
> > > +                struct {
> > > +                    unsigned cpu:8;
> > > +                } d;
> > > +                d.cpu = cpu;
> > > +                trace_var(TRC_CSCHED_TICKLE, 0,
> > > +                          sizeof(d),
> > > +                          (unsigned char*)&d);
> > 
> > Why not just TRC_1D()? 
> >
> As I tried to explain in the comment, I just wanted to avoid checking
> for !tb_init_done more than once, as this happens within a loop and, at
> least potentially, there may be more CPUs to tickle (and thus more calls
> to TRACE_1D).

If tb_init_done isn't marked volatile or anything like that isn't the
check hoisted out of the loop by the compiler?

> I take this comment of yours as you not thinking that is
> something worthwhile, right? If so, I can definitely turn this into a
> "standard" TRACE_1D() call.

Or maybe consider __TRACE_1D and friends which omit the check?

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 12:01       ` Ian Campbell
@ 2012-12-05 12:15         ` Dario Faggioli
  2012-12-05 12:20           ` Ian Campbell
  2012-12-05 12:38           ` Mats Petersson
  0 siblings, 2 replies; 18+ messages in thread
From: Dario Faggioli @ 2012-12-05 12:15 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel, Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1245 bytes --]

On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: 
> > As I tried to explain in the comment, I just wanted to avoid checking
> > for !tb_init_done more than once, as this happens within a loop and, at
> > least potentially, there may be more CPUs to tickle (and thus more calls
> > to TRACE_1D).
> 
> If tb_init_done isn't marked volatile or anything like that isn't the
> check hoisted out of the loop by the compiler?
> 
Good point. As they're all macros, yes, I think that is something very
likely to happen... Although, I haven't checked the generated code, I'll
take a look. Thanks.

> > I take this comment of yours as you not thinking that is
> > something worthwhile, right? If so, I can definitely turn this into a
> > "standard" TRACE_1D() call.
> 
> Or maybe consider __TRACE_1D and friends which omit the check?
> 
Mmm... It may well be me, but my

$ grep __TRACE xen/* -R

does not show any results... What am I missing?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs
  2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
  2012-12-03 17:12   ` Ian Campbell
@ 2012-12-05 12:16   ` George Dunlap
  1 sibling, 0 replies; 18+ messages in thread
From: George Dunlap @ 2012-12-05 12:16 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Keir (Xen.org)

[-- Attachment #1: Type: text/plain, Size: 3868 bytes --]

On 03/12/12 16:34, Dario Faggioli wrote:
> Right now, when a VCPU wakes-up, we check if the it should preempt
> what is running on the PCPU, and whether or not the waking VCPU can
> be migrated (by tickling some idlers). However, this can result in
> suboptimal or even wrong behaviour, as explained here:
>
>   http://lists.xen.org/archives/html/xen-devel/2012-10/msg01732.html
>
> This change, instead, when deciding what PCPUs to tickle upon VCPU
> wake-up, considers both what it is likely to happen on the PCPU
> where the wakeup occurs, as well as whether or not there are idle
> PCPUs where to run the waking VCPU.
> In fact, if there are idlers where the new VCPU can run, we can
> avoid interrupting the running VCPU. OTOH, in case there aren't
> any of these PCPUs, preemption and migration are the way to go.
>
> This has been tested by running the following benchmarks inside 2,
> 6 and 10 VMs concurrently, on a shared host, each with 2 VCPUs and
> 960 MB of memory (host has 16 ways and 12 GB RAM).
>
> 1) All VMs had 'cpus="all"' in their config file.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 50.078467 +/- 1.6676162 | 49.704933 +/- 0.0277184 |
>   | 6   | 63.259472 +/- 0.1137586 | 62.227367 +/- 0.3880619 |
>   | 10  | 91.246797 +/- 0.1154008 | 91.174820 +/- 0.0928781 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 485.56333 +/- 6.0527356 | 525.57833 +/- 25.085826 |
>   | 6   | 401.36278 +/- 1.9745916 | 421.96111 +/- 9.0364048 |
>   | 10  | 294.43933 +/- 0.8064945 | 302.49033 +/- 0.2343978 |
> $ specjbb2005 ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 43150.63 +/- 1359.5616  | 42720.632 +/- 1937.4488 |
>   | 6   | 29274.29 +/- 1024.4042  | 29518.171 +/- 1014.5239 |
>   | 10  | 19061.28 +/- 512.88561  | 19050.141 +/- 458.77327 |
>
>
> 2) All VMs had their VCPUs statically pinned to the host's PCPUs.
>
> $ sysbench --test=cpu ... (time, lower is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 47.8211   +/- 0.0215504 | 47.826900 +/- 0.0077872 |
>   | 6   | 62.689122 +/- 0.0877173 | 62.764539 +/- 0.3882493 |
>   | 10  | 90.321097 +/- 1.4803867 | 89.974570 +/- 1.1437566 |
> $ sysbench --test=memory ... (throughput, higher is better)
>   | VMs | w/o this change         | w/ this change          |
>   | 2   | 550.97667 +/- 2.3512355 | 550.87000 +/- 0.8140792 |
>   | 6   | 443.15000 +/- 5.7471797 | 454.01056 +/- 8.4373466 |
>   | 10  | 313.89233 +/- 1.3237493 | 321.81167 +/- 0.3528418 |
> $ specjbb2005 ... (throughput, higher is better)
>   | 2   | 49591.057 +/- 952.93384 | 49610.98  +/- 1242.1675 |
>   | 6   | 33538.247 +/- 1089.2115 | 33682.222 +/- 1216.1078 |
>   | 10  | 21927.870 +/- 831.88742 | 21801.138 +/- 561.97068 |
>
>
> Numbers show how the change has either no or very limited impact
> (specjbb2005 case) or, when it does have some impact, that is an
> actual improvement in performances, especially in the
> sysbench-memory case.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

So I think the principle is good, but the resulting set of "if" 
statements is hard to figure out what's going on.

What do you think about re-arranging things, something like the attached?

This particular version I got rid of the stats, because they require 
if() statements that break up the flow.  If we really think they're 
useful, maybe we could have a separate block somewhere for them?

We could actually do without the idlers_empty entirely, as if we just 
remove the condition from the "else" block, the "right thing" will 
happen; however, it means several unnecessary cpumask operations on a 
busy system.

Thoughts?

  -George


[-- Attachment #2: xen_sched_credit_improve_tickling_of_idle_cpus --]
[-- Type: text/plain, Size: 3773 bytes --]

xen: sched_credit, improve tickling of idle CPUs

RFC: Re-organized ifs

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -249,54 +249,53 @@ static inline void
     struct csched_vcpu * const cur =
         CSCHED_VCPU(per_cpu(schedule_data, cpu).curr);
     struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu));
-    cpumask_t mask;
+    cpumask_t mask, idle_mask;
+    int idlers_empty;
 
     ASSERT(cur);
     cpumask_clear(&mask);
 
-    /* If strictly higher priority than current VCPU, signal the CPU */
-    if ( new->pri > cur->pri )
+    idlers_empty = cpumask_empty(prv->idlers);
+    /*
+     * If the pcpu is idle, or there are no idlers and the new
+     * vcpu is a higher priority than the old vcpu, run it here.
+     *
+     * If there are idle cpus, first try to find one suitable to run
+     * "new", so we can avoid preempting cur.  If we cannot find a
+     * suitable idler on which to run "new", run it here, but try to
+     * find a suitable idler on which to run "cur" instead.
+     */
+    if ( cur->pri == CSCHED_PRI_IDLE
+         || (idlers_empty && new->pri > cur->pri) )
     {
-        if ( cur->pri == CSCHED_PRI_IDLE )
-            SCHED_STAT_CRANK(tickle_local_idler);
-        else if ( cur->pri == CSCHED_PRI_TS_OVER )
-            SCHED_STAT_CRANK(tickle_local_over);
-        else if ( cur->pri == CSCHED_PRI_TS_UNDER )
-            SCHED_STAT_CRANK(tickle_local_under);
-        else
-            SCHED_STAT_CRANK(tickle_local_other);
-
         cpumask_set_cpu(cpu, &mask);
     }
+    else if (!idlers_empty)
+    {
+        /* Check whether or not there are idlers that can run new */
+        cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
 
-    /*
-     * If this CPU has at least two runnable VCPUs, we tickle any idlers to
-     * let them know there is runnable work in the system...
-     */
-    if ( cur->pri > CSCHED_PRI_IDLE )
-    {
-        if ( cpumask_empty(prv->idlers) )
+        /* If there are no suitable idlers for new, and it's higher
+         * priority than cur, wake up the current cpu, but also
+         * look for idlers suitable for cur. */
+        if (cpumask_empty(&idle_mask) && new->pri > cur->pri)
         {
-            SCHED_STAT_CRANK(tickle_idlers_none);
+            cpumask_set_cpu(cpu, &mask);
+            cpumask_and(&idle_mask, prv->idlers, cur->vcpu->cpu_affinity);
         }
-        else
+
+        /* Which of the idlers shall we wake up? */
+        if ( !cpumask_empty(&idle_mask) )
         {
-            cpumask_t idle_mask;
-
-            cpumask_and(&idle_mask, prv->idlers, new->vcpu->cpu_affinity);
-            if ( !cpumask_empty(&idle_mask) )
+            SCHED_STAT_CRANK(tickle_idlers_some);
+            if ( opt_tickle_one_idle )
             {
-                SCHED_STAT_CRANK(tickle_idlers_some);
-                if ( opt_tickle_one_idle )
-                {
-                    this_cpu(last_tickle_cpu) = 
-                        cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
-                    cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
-                }
-                else
-                    cpumask_or(&mask, &mask, &idle_mask);
+                this_cpu(last_tickle_cpu) = 
+                    cpumask_cycle(this_cpu(last_tickle_cpu), &idle_mask);
+                cpumask_set_cpu(this_cpu(last_tickle_cpu), &mask);
             }
-            cpumask_and(&mask, &mask, new->vcpu->cpu_affinity);
+            else
+                cpumask_or(&mask, &mask, &idle_mask);
         }
     }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 12:15         ` Dario Faggioli
@ 2012-12-05 12:20           ` Ian Campbell
  2012-12-05 12:25             ` George Dunlap
  2012-12-05 12:38           ` Mats Petersson
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2012-12-05 12:20 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Keir (Xen.org)

On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote:
> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote: 
> > > As I tried to explain in the comment, I just wanted to avoid checking
> > > for !tb_init_done more than once, as this happens within a loop and, at
> > > least potentially, there may be more CPUs to tickle (and thus more calls
> > > to TRACE_1D).
> > 
> > If tb_init_done isn't marked volatile or anything like that isn't the
> > check hoisted out of the loop by the compiler?
> > 
> Good point. As they're all macros, yes, I think that is something very
> likely to happen... Although, I haven't checked the generated code, I'll
> take a look. Thanks.
> 
> > > I take this comment of yours as you not thinking that is
> > > something worthwhile, right? If so, I can definitely turn this into a
> > > "standard" TRACE_1D() call.
> > 
> > Or maybe consider __TRACE_1D and friends which omit the check?
> > 
> Mmm... It may well be me, but my
> 
> $ grep __TRACE xen/* -R
> 
> does not show any results... What am I missing?

I meant to define + use those macros.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 12:20           ` Ian Campbell
@ 2012-12-05 12:25             ` George Dunlap
  0 siblings, 0 replies; 18+ messages in thread
From: George Dunlap @ 2012-12-05 12:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Keir (Xen.org), xen-devel, Dario Faggioli

On 05/12/12 12:20, Ian Campbell wrote:
> On Wed, 2012-12-05 at 12:15 +0000, Dario Faggioli wrote:
>> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote:
>>>> As I tried to explain in the comment, I just wanted to avoid checking
>>>> for !tb_init_done more than once, as this happens within a loop and, at
>>>> least potentially, there may be more CPUs to tickle (and thus more calls
>>>> to TRACE_1D).
>>> If tb_init_done isn't marked volatile or anything like that isn't the
>>> check hoisted out of the loop by the compiler?
>>>
>> Good point. As they're all macros, yes, I think that is something very
>> likely to happen... Although, I haven't checked the generated code, I'll
>> take a look. Thanks.
>>
>>>> I take this comment of yours as you not thinking that is
>>>> something worthwhile, right? If so, I can definitely turn this into a
>>>> "standard" TRACE_1D() call.
>>> Or maybe consider __TRACE_1D and friends which omit the check?
>>>
>> Mmm... It may well be me, but my
>>
>> $ grep __TRACE xen/* -R
>>
>> does not show any results... What am I missing?
> I meant to define + use those macros.

Well ATM there would be only one user -- and "trace_var(..., 
sizeof(cpu), &cpu);" is probably just as pretty as __TRACE_1D(..., cpu).

I wouldn't oppose such a patch, but I don't think it should be required 
until we want to use "__TRACE_(N>2)D".

  -George

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3 of 3] xen: sched_credit: add some tracing
  2012-12-05 12:15         ` Dario Faggioli
  2012-12-05 12:20           ` Ian Campbell
@ 2012-12-05 12:38           ` Mats Petersson
  1 sibling, 0 replies; 18+ messages in thread
From: Mats Petersson @ 2012-12-05 12:38 UTC (permalink / raw)
  To: xen-devel

On 05/12/12 12:15, Dario Faggioli wrote:
> On Wed, 2012-12-05 at 12:01 +0000, Ian Campbell wrote:
>>> As I tried to explain in the comment, I just wanted to avoid checking
>>> for !tb_init_done more than once, as this happens within a loop and, at
>>> least potentially, there may be more CPUs to tickle (and thus more calls
>>> to TRACE_1D).
>> If tb_init_done isn't marked volatile or anything like that isn't the
>> check hoisted out of the loop by the compiler?
>>
> Good point. As they're all macros, yes, I think that is something very
> likely to happen... Although, I haven't checked the generated code, I'll
> take a look. Thanks.
But if there is a call to an opaque function (any function the compiler 
doesn't "know" what it does in the current compile unit - or one where 
the compiler can't determine that the global variable isn't being 
modified, e.g. anything that uses pointers the compiler can't trace, 
etc) in between, the compiler must NOT move the actual read of a global 
variable. So it would be worth checking the compile output.

--
Mats
>
>>> I take this comment of yours as you not thinking that is
>>> something worthwhile, right? If so, I can definitely turn this into a
>>> "standard" TRACE_1D() call.
>> Or maybe consider __TRACE_1D and friends which omit the check?
>>
> Mmm... It may well be me, but my
>
> $ grep __TRACE xen/* -R
>
> does not show any results... What am I missing?
>
> Dario
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-12-05 12:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-03 16:34 [PATCH 0 of 3] xen: sched_credit: fix tickling and add some tracing Dario Faggioli
2012-12-03 16:34 ` [PATCH 1 of 3] xen: sched_credit, improve tickling of idle CPUs Dario Faggioli
2012-12-03 17:12   ` Ian Campbell
2012-12-03 18:26     ` Dario Faggioli
2012-12-05 12:16   ` George Dunlap
2012-12-03 16:34 ` [PATCH 2 of 3] xen: tracing: introduce per-scheduler trace event IDs Dario Faggioli
2012-12-04 18:53   ` George Dunlap
2012-12-04 18:55     ` George Dunlap
2012-12-05 11:57     ` Dario Faggioli
2012-12-03 16:35 ` [PATCH 3 of 3] xen: sched_credit: add some tracing Dario Faggioli
2012-12-04 19:10   ` George Dunlap
2012-12-05 11:54     ` Dario Faggioli
2012-12-05 11:51       ` George Dunlap
2012-12-05 12:01       ` Ian Campbell
2012-12-05 12:15         ` Dario Faggioli
2012-12-05 12:20           ` Ian Campbell
2012-12-05 12:25             ` George Dunlap
2012-12-05 12:38           ` Mats Petersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).