xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dfaggioli@suse.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>
Subject: [RFC PATCH v1 02/16] xen: Credit1: always steal from pcpus with runnable but not running vcpus
Date: Sat, 25 Aug 2018 01:35:26 +0200	[thread overview]
Message-ID: <153515372661.8598.14583085291840756167.stgit@Istar.fritz.box> (raw)
In-Reply-To: <153515305655.8598.6054293649487840735.stgit@Istar.fritz.box>

Currently, if a pcpu is idle and has some runnable vcpus in its
runqueue, the other pcpus (e.g., when they're going idle) never try to
steal any of such runnable vcpus.

This is because Credit1 assumes that, when a vcpu wakes up, if the pcpu
on which it is queued is idle, that very pcpu is _always_ tickled, and
will _always_ have the chance to pick up someone from its own runqueue.

While true for now, this is going to change. In fact:
- we want to be able to tickle a different pcpu than the one where the
  vcpu has been queues, if that is better for load distribution (or for
  honoring soft-affinity);
- when introducing SMT-aware domain coscheduling, it can happen that a
  pcpu, even if idle, can't pick up vcpus from its own runqueue, and
  hence we do want other vcpus to be able to try to steal and run them.

Let's, therefore, remove the logic implementing such assumption, making
sure that runnable vcpus are picked up from the runqueues of whatever
pcpu they're in.

We also make it more clear, while tracing, whether a cpu is being
considered or skipped (and, if the latter, why) during load balancing.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
TODO:
- for tracing, deal with xentrace's formats too;
- we now check a lot of pcpus. Although we do that without taking their
  runqueue lock, the overhead may be significant in big systems. In
  order to avoid having to do that, we can introduce a new cpumask,
  tracking the 'overloaded' pcpus (i.e., the pcpus which have at least
  one queued, runnable, vcpu), and only look at them.
---
 tools/xentrace/xenalyze.c |   11 +++++++----
 xen/common/sched_credit.c |   29 +++++++++++++----------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 5ed0a12327..a1e2531945 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -7655,12 +7655,15 @@ void sched_process(struct pcpu_info *p)
         case TRC_SCHED_CLASS_EVT(CSCHED, 11): /* STEAL_CHECK   */
             if(opt.dump_all) {
                 struct {
-                    unsigned int peer_cpu, check;
+                    unsigned int peer_cpu;
+                    int check;
                 } *r = (typeof(r))ri->d;
 
-                printf(" %s csched:load_balance %s %u\n",
-                       ri->dump_header, r->check ? "checking" : "skipping",
-                       r->peer_cpu);
+                printf(" %s csched:load_balance cpu %u: %s (nr_runnable=%d)\n",
+                       ri->dump_header, r->peer_cpu,
+                       r->check == 0 ? "skipping" :
+                           (r->check < 0 ? "locked" : "checking" ),
+                       r->check < 0 ? -r->check : r->check);
             }
             break;
         /* CREDIT 2 (TRC_CSCHED2_xxx) */
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b8765ac6df..48d80993b1 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1606,11 +1606,8 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step)
 
     ASSERT(peer_pcpu != NULL);
 
-    /*
-     * Don't steal from an idle CPU's runq because it's about to
-     * pick up work from it itself.
-     */
-    if ( unlikely(is_idle_vcpu(curr_on_cpu(peer_cpu))) )
+    /* Don't steal from pcpus which have no runnable vcpu queued. */
+    if ( unlikely(peer_pcpu->nr_runnable == 0) )
         goto out;
 
     list_for_each( iter, &peer_pcpu->runq )
@@ -1695,15 +1692,15 @@ csched_load_balance(struct csched_private *prv, int cpu,
 
     /*
      * Let's look around for work to steal, taking both hard affinity
-     * and soft affinity into account. More specifically, we check all
-     * the non-idle CPUs' runq, looking for:
+     * and soft affinity into account. More specifically, we check the
+     * CPUs' runq with runnable vCPUs, looking for:
      *  1. any "soft-affine work" to steal first,
      *  2. if not finding anything, any "hard-affine work" to steal.
      */
     for_each_affinity_balance_step( bstep )
     {
         /*
-         * We peek at the non-idling CPUs in a node-wise fashion. In fact,
+         * We peek at the CPUs in a node-wise fashion. In fact,
          * it is more likely that we find some affine work on our same
          * node, not to mention that migrating vcpus within the same node
          * could well expected to be cheaper than across-nodes (memory
@@ -1713,8 +1710,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
         do
         {
             /* Select the pCPUs in this node that have work we can steal. */
-            cpumask_andnot(&workers, online, prv->idlers);
-            cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
+            cpumask_and(&workers, online, &node_to_cpumask(peer_node));
             __cpumask_clear_cpu(cpu, &workers);
 
             first_cpu = cpumask_cycle(prv->balance_bias[peer_node], &workers);
@@ -1723,6 +1719,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
             peer_cpu = first_cpu;
             do
             {
+                const struct csched_pcpu * const pspc = CSCHED_PCPU(peer_cpu);
                 spinlock_t *lock;
 
                 /*
@@ -1735,9 +1732,9 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  * In more details:
                  * - if we race with dec_nr_runnable(), we may try to take the
                  *   lock and call csched_runq_steal() for no reason. This is
-                 *   not a functional issue, and should be infrequent enough.
-                 *   And we can avoid that by re-checking nr_runnable after
-                 *   having grabbed the lock, if we want;
+                 *   not a functional issue and, in any case, we re-check
+                 *   nr_runnable inside csched_runq_steal, after grabbing the
+                 *   lock;
                  * - if we race with inc_nr_runnable(), we skip a pCPU that may
                  *   have runnable vCPUs in its runqueue, but that's not a
                  *   problem because:
@@ -1751,7 +1748,7 @@ csched_load_balance(struct csched_private *prv, int cpu,
                  *     optimization), it the pCPU would schedule right after we
                  *     have taken the lock, and hence block on it.
                  */
-                if ( CSCHED_PCPU(peer_cpu)->nr_runnable == 0 )
+                if ( pspc->nr_runnable == 0 )
                 {
                     TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, /* skipp'n */ 0);
                     goto next_cpu;
@@ -1769,11 +1766,11 @@ csched_load_balance(struct csched_private *prv, int cpu,
                 if ( !lock )
                 {
                     SCHED_STAT_CRANK(steal_trylock_failed);
-                    TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, /* skip */ 0);
+                    TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, -pspc->nr_runnable);
                     goto next_cpu;
                 }
 
-                TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, /* checked */ 1);
+                TRACE_2D(TRC_CSCHED_STEAL_CHECK, peer_cpu, pspc->nr_runnable);
 
                 /* Any work over there to steal? */
                 speer = cpumask_test_cpu(peer_cpu, online) ?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2018-08-24 23:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24 23:35 [RFC PATCH v1 00/16] xen: sched: implement core-scheduling Dario Faggioli
2018-08-24 23:35 ` [RFC PATCH v1 01/16] xen: Credit1: count runnable vcpus, not running ones Dario Faggioli
2018-08-24 23:35 ` Dario Faggioli [this message]
2018-08-24 23:35 ` [RFC PATCH v1 03/16] xen: Credit1: do not always tickle an idle pcpu Dario Faggioli
2018-08-24 23:35 ` [RFC PATCH v1 04/16] xen: sched: make the logic for tracking idle core generic Dario Faggioli
2018-08-24 23:35 ` [RFC PATCH v1 05/16] xen: Credit1: track fully idle cores Dario Faggioli
2018-08-24 23:35 ` [RFC PATCH v1 06/16] xen: Credit1: check for fully idle cores when tickling Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 07/16] xen: Credit1: reorg __runq_tickle() code a bit Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 08/16] xen: Credit1: reorg csched_schedule() " Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 09/16] xen: Credit1: SMT-aware domain co-scheduling parameter and data structs Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 10/16] xen: Credit1: support sched_smt_cosched in csched_schedule() Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 11/16] xen: Credit1: support sched_smt_cosched in _csched_cpu_pick() Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 12/16] xen: Credit1: support sched_smt_cosched in csched_runq_steal() Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 13/16] xen: Credit1: sched_smt_cosched support in __csched_vcpu_is_migrateable() Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 14/16] xen: Credit1: sched_smt_cosched support in __runq_tickle() for pinned vcpus Dario Faggioli
2018-08-24 23:36 ` [RFC PATCH v1 15/16] xen: Credit1: sched_smt_cosched support in __runq_tickle() Dario Faggioli
2018-08-24 23:37 ` [RFC PATCH v1 16/16] xen/tools: tracing of Credit1 SMT domain co-scheduling support Dario Faggioli
2018-09-07 16:00 ` [RFC PATCH v1 00/16] xen: sched: implement core-scheduling Juergen Gross
2018-10-11 17:37   ` Dario Faggioli
2018-10-12  5:15     ` Juergen Gross
2018-10-12  7:49       ` Dario Faggioli
2018-10-12  8:35         ` Juergen Gross
2018-10-12  9:15           ` Dario Faggioli
2018-10-12  9:23             ` Juergen Gross
2018-10-18 10:40               ` Dario Faggioli
2018-10-17 21:36 ` Tamas K Lengyel
2018-10-18  8:16   ` Dario Faggioli
2018-10-18 12:55     ` Tamas K Lengyel
2018-10-18 13:48       ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153515372661.8598.14583085291840756167.stgit@Istar.fritz.box \
    --to=dfaggioli@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).