xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Uma Sharma <uma.sharma523@gmail.com>
Subject: [PATCH 8/9] xen: sched: allow for choosing credit2 runqueues configuration at boot
Date: Tue, 29 Sep 2015 18:56:25 +0200	[thread overview]
Message-ID: <20150929165625.17589.17838.stgit@Solace.station> (raw)
In-Reply-To: <20150929164726.17589.96920.stgit@Solace.station>

In fact, credit2 uses CPU topology to decide how to arrange
its internal runqueues. Before this change, only 'one runqueue
per socket' was allowed. However, experiments have shown that,
for instance, having one runqueue per physical core improves
performance, especially in case hyperthreading is available.

In general, it makes sense to allow users to pick one runqueue
arrangement at boot time, so that:
 - more experiments can be easily performed to even better
   assess and improve performance;
 - one can select the best configuration for his specific
   use case and/or hardware.

This patch enables the above.

Note that, for correctly arranging runqueues to be per-core,
just checking cpu_to_core() on the host CPUs is not enough.
In fact, cores (and hyperthreads) on different sockets, can
have the same core (and thread) IDs! We, therefore, need to
check whether the full topology of two CPUs matches, for
them to be put in the same runqueue.

Note also that the default (although not functional) for
credit2, since now, has been per-socket runqueue. This patch
leaves things that way, to avoid mixing policy and technical
changes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Uma Sharma <uma.sharma523@gmail.com>
---
 docs/misc/xen-command-line.markdown |   11 +++++++
 xen/common/sched_credit2.c          |   57 ++++++++++++++++++++++++++++++++---
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a2e427c..71315b8 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -467,6 +467,17 @@ combination with the `low_crashinfo` command line option.
 ### credit2\_load\_window\_shift
 > `= <integer>`
 
+### credit2\_runqueue
+> `= socket | core`
+
+> Default: `socket`
+
+Specify how host CPUs are arranged in runqueues. Runqueues are kept
+balanced with respect to the load generated by the vCPUs running on
+them. Smaller runqueues (as in with `core`) means more accurate load
+balancing (for instance, it will deal better with hyperthreading),
+but also more overhead.
+
 ### dbgp
 > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 38f382e..025626f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -82,10 +82,6 @@
  * Credits are "reset" when the next vcpu in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
- *
- * The plan is for all cores that share an L2 will share the same
- * runqueue.  At the moment, there is one global runqueue for all
- * cores.
  */
 
 /*
@@ -194,6 +190,41 @@ static int __read_mostly opt_overload_balance_tolerance = -3;
 integer_param("credit2_balance_over", opt_overload_balance_tolerance);
 
 /*
+ * Runqueue organization.
+ *
+ * The various cpus are to be assigned each one to a runqueue, and we
+ * want that to happen basing on topology. At the moment, it is possible
+ * to choose to arrange runqueues to be:
+ *
+ * - per-core: meaning that there will be one runqueue per each physical
+ *             core of the host. This will happen if the opt_runqueue
+ *             parameter is set to 'core';
+ *
+ * - per-socket: meaning that there will be one runqueue per each physical
+ *               socket (AKA package, which often, but not always, also
+ *               matches a NUMA node) of the host; This will happen if
+ *               the opt_runqueue parameter is set to 'socket';
+ *
+ * Depending on the value of opt_runqueue, therefore, cpus that are part of
+ * either the same physical core, or of the same physical socket, will be
+ * put together to form runqueues.
+ */
+#define OPT_RUNQUEUE_CORE   1
+#define OPT_RUNQUEUE_SOCKET 2
+static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
+
+static void parse_credit2_runqueue(const char *s)
+{
+    if ( !strncmp(s, "core", 4) && !s[4] )
+        opt_runqueue=OPT_RUNQUEUE_CORE;
+    else if ( !strncmp(s, "socket", 6) && !s[6] )
+        opt_runqueue=OPT_RUNQUEUE_SOCKET;
+    else
+        printk("WARNING, unrecognized value of credit2_runqueue option!\n");
+}
+custom_param("credit2_runqueue", parse_credit2_runqueue);
+
+/*
  * Per-runqueue data
  */
 struct csched2_runqueue_data {
@@ -1985,6 +2016,17 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
+static inline bool_t same_socket(unsigned int cpua, unsigned int cpub)
+{
+    return cpu_to_socket(cpua) == cpu_to_socket(cpub);
+}
+
+static inline bool_t same_core(unsigned int cpua, unsigned int cpub)
+{
+    return same_socket(cpua, cpub) &&
+           cpu_to_core(cpua) == cpu_to_core(cpub);
+}
+
 static unsigned int
 cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
@@ -2017,7 +2059,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu)
         BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
                cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
 
-        if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) )
+        if ( (opt_runqueue == OPT_RUNQUEUE_CORE &&
+              same_core(peer_cpu, cpu)) ||
+             (opt_runqueue == OPT_RUNQUEUE_SOCKET &&
+              same_socket(peer_cpu, cpu)) )
             break;
     }
 
@@ -2140,6 +2185,8 @@ csched2_init(struct scheduler *ops)
     printk(" load_window_shift: %d\n", opt_load_window_shift);
     printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance);
     printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance);
+    printk(" runqueues arrangement: per-%s\n",
+           opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket");
 
     if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN )
     {

  parent reply	other threads:[~2015-09-29 16:56 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29 16:55 [PATCH 0/9] xen: sched: improve (a lot! :-D) Credit2 runqueue handling Dario Faggioli
2015-09-29 16:55 ` [PATCH 1/9] xen: sched: fix an 'off by one \t' in credit2 debug dump Dario Faggioli
2015-10-01  5:22   ` Juergen Gross
2015-10-08 14:09   ` George Dunlap
2015-09-29 16:55 ` [PATCH 2/9] xen: sched: improve scope and placement of credit2 boot parameters Dario Faggioli
2015-10-01  5:23   ` Juergen Gross
2015-10-01  7:51   ` Jan Beulich
2015-10-01  8:17     ` Dario Faggioli
2015-09-29 16:55 ` [PATCH 3/9] xen: sched: make locking for {insert, remove}_vcpu consistent Dario Faggioli
2015-09-29 17:31   ` Andrew Cooper
2015-09-29 21:40     ` Dario Faggioli
2015-09-29 21:56       ` Dario Faggioli
2015-09-30  9:00       ` Andrew Cooper
2015-10-08 14:58     ` George Dunlap
2015-10-08 15:20       ` Andrew Cooper
2015-10-08 16:46         ` George Dunlap
2015-10-08 17:23           ` Andrew Cooper
2015-10-08 20:44             ` Dario Faggioli
2015-10-12  9:44             ` George Dunlap
2015-10-08 20:39         ` Dario Faggioli
2015-10-09 13:05           ` Andrew Cooper
2015-10-09 16:56             ` Dario Faggioli
2015-10-01  8:03   ` Jan Beulich
2015-10-01 11:59     ` Dario Faggioli
2015-09-29 16:55 ` [PATCH 4/9] xen: sched: add .init_pdata hook to the scheduler interface Dario Faggioli
2015-10-01  5:21   ` Juergen Gross
2015-10-01  6:33     ` Dario Faggioli
2015-10-01  7:43       ` Juergen Gross
2015-10-01  9:32         ` Andrew Cooper
2015-10-01  9:40           ` Dario Faggioli
2015-10-01  8:17   ` Jan Beulich
2015-10-01  9:26     ` Dario Faggioli
2015-10-01 10:12       ` Jan Beulich
2015-10-01 10:35         ` Dario Faggioli
2015-10-01 10:47           ` Jan Beulich
2015-09-29 16:56 ` [PATCH 5/9] xen: sched: make implementing .alloc_pdata optional Dario Faggioli
2015-10-01  5:28   ` Juergen Gross
2015-10-01  6:35     ` Dario Faggioli
2015-10-01  7:49   ` Jan Beulich
2015-10-01  8:13     ` Dario Faggioli
2015-09-29 16:56 ` [PATCH 6/9] xen: sched: implement .init_pdata in all schedulers Dario Faggioli
2015-09-29 16:56 ` [PATCH 7/9] xen: sched: fix per-socket runqueue creation in credit2 Dario Faggioli
2015-09-29 16:56 ` Dario Faggioli [this message]
2015-10-01  5:48   ` [PATCH 8/9] xen: sched: allow for choosing credit2 runqueues configuration at boot Juergen Gross
2015-10-01  7:23     ` Dario Faggioli
2015-10-01  7:46       ` Juergen Gross
2015-09-29 16:56 ` [PATCH 9/9] xen: sched: per-core runqueues as default in credit2 Dario Faggioli
2015-10-01  5:48   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150929165625.17589.17838.stgit@Solace.station \
    --to=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=uma.sharma523@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).