[PATCH 00 of 10 v2] NUMA aware credit scheduling

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xen.org
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Anil Madhavapeddy <anil@recoil.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Daniel De Graaf <dgdegra@tycho.nsa.gov>,
	Matt Wilson <msw@amazon.com>
Subject: [PATCH 00 of 10 v2] NUMA aware credit scheduling
Date: Wed, 19 Dec 2012 20:07:16 +0100	[thread overview]
Message-ID: <patchbomb.1355944036@Solace> (raw)

Hello Everyone,

Here it is the take 2 of the NUMA aware credit scheduling series. Sorry it took
a bit, but I had to take care of those nasty bugs causing scheduling anomalies,
as they were getting in the way and messing up the numbers when trying to
evaluate performances of this! :-)

I also rewrote most of the core of the two step vcpu and node affinity
balancing algorithm, as per George's suggestion during last round, to try to
squeeze a little bit more of performances improvement.

As already and repeatedly said, what the series does is providing the (credit)
scheduler with the knowledge of a domain's node-affinity. It will then always
try to run domain's vCPUs on one of those nodes first. Only if that turns out
to be impossible, it falls back to the old behaviour. (BTW, for any update on
the status of my "quest" about improving NUMA support in Xen, look
http://wiki.xen.org/wiki/Xen_NUMA_Roadmap.)

I rerun my usual benchmark, SpecJBB2005, plus some others, i.e., some
configurations of sysbench and lmbench. A little bit more about them follows:

 * SpecJBB is all about throughput, so pinning is likely the ideal solution.

 * Sysbench-memory is the time it takes for writing a fixed amount of memory
   (and then it is the throughput that is measured). What we expect is
   locality to be important, but at the same time the potential imbalances
   due to pinning could have a say in it.

 * Lmbench-proc is the time it takes for a process to fork a fixed amount of
   children. This is much more about latency than throughput, with locality
   of memory accesses playing a smaller role and, again, imbalances due to
   pinning being a potential issue.

On a 2 nodes, 16 cores system, where I can have 2 to 10 VMs (2 vCPUs each)
executing the benchmarks concurrently, here's what I get:

 ----------------------------------------------------
 | SpecJBB2005, throughput (the higher the better)  |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |   43451.853 | 49876.750 |    49693.653    |
 |    6 |   29368.589 | 33782.132 |    33692.936    |
 |   10 |   19138.934 | 21950.696 |    21413.311    |
 ----------------------------------------------------
 | Sysbench memory, throughput (the higher the better)
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  484.42167  | 552.32667 |    552.86167    |
 |    6 |  404.43667  | 440.00056 |    449.42611    |
 |   10 |  296.45600  | 315.51733 |    331.49067    |
 ----------------------------------------------------
 | LMBench proc, latency (the lower the better)     |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 ----------------------------------------------------
 |    2 |  824.00437  | 749.51892 |    741.42952    |
 |    6 |  942.39442  | 985.02761 |    974.94700    |
 |   10 |  1254.3121  | 1363.0792 |    1301.2917    |
 ----------------------------------------------------

Which, reasoning in terms of %-performances increase/decrease, means NUMA aware
scheduling does as follows, as compared to no affinity at all and to pinning:

     ----------------------------------
     | SpecJBB2005 (throughput)       |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +14.36%   |   -0.36%  |
     |    6 |   +14.72%   |   -0.26%  |
     |   10 |   +11.88%   |   -2.44%  |
     ----------------------------------
     | Sysbench memory (throughput)   |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +14.12%   |   +0.09%  |
     |    6 |   +11.12%   |   +2.14%  |
     |   10 |   +11.81%   |   +5.06%  |
     ----------------------------------
     | LMBench proc (latency)         |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     ----------------------------------
     |    2 |   +10.02%   |   +1.07%  |
     |    6 |    +3.45%   |   +1.02%  |
     |   10 |    +2.94%   |   +4.53%  |
     ----------------------------------

Numbers seem to tell we're being successful in taking advantage of both the
improved locality (when compared to no affinity) and the greater flexibility
the NUMA aware scheduling approach gives us (when compared to pinning).  In
fact, when throughput only is concerned (SpecJBB case), it behaves almost on
par with pinning, and a lot better than no affinity at all. Moreover, we're
even able to do better than them both, when latency comes a little bit more
into the game and the imbalances caused by pinning would make things worse than
not having any affinity, like in the sysbench and, especially, in the LMBench
case.

Here are the patches included in the series. I '*'-ed ones already received one
or more acks during v1.  However, there are patches that were significantly
reworked since then. In that case, I just ignored that, and left them with my
SOB only, as I think they definitely need to be re-reviewd. :-)

 * [ 1/10] xen, libxc: rename xenctl_cpumap to xenctl_bitmap
 * [ 2/10] xen, libxc: introduce node maps and masks
   [ 3/10] xen: sched_credit: let the scheduler know about node-affinity
   [ 4/10] xen: allow for explicitly specifying node-affinity
 * [ 5/10] libxc: allow for explicitly specifying node-affinity
 * [ 6/10] libxl: allow for explicitly specifying node-affinity
   [ 7/10] libxl: optimize the calculation of how many VCPUs can run on a candidate
 * [ 8/10] libxl: automatic placement deals with node-affinity
 * [ 9/10] xl: add node-affinity to the output of `xl list`
   [10/10] docs: rearrange and update NUMA placement documentation

Thanks and Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

next             reply	other threads:[~2012-12-19 19:07 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19 19:07 Dario Faggioli [this message]
2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-12-20  9:17   ` Jan Beulich
2012-12-20  9:35     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 02 of 10 v2] xen, libxc: introduce node maps and masks Dario Faggioli
2012-12-20  9:18   ` Jan Beulich
2012-12-20  9:55     ` Dario Faggioli
2012-12-20 14:33     ` George Dunlap
2012-12-20 14:52       ` Jan Beulich
2012-12-20 15:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity Dario Faggioli
2012-12-20  6:44   ` Juergen Gross
2012-12-20  8:16     ` Dario Faggioli
2012-12-20  8:25       ` Juergen Gross
2012-12-20  8:33         ` Dario Faggioli
2012-12-20  8:39           ` Juergen Gross
2012-12-20  8:58             ` Dario Faggioli
2012-12-20 15:28             ` George Dunlap
2012-12-20 16:00               ` Dario Faggioli
2012-12-20  9:22           ` Jan Beulich
2012-12-20 15:56   ` George Dunlap
2012-12-20 17:12     ` Dario Faggioli
2012-12-20 16:48   ` George Dunlap
2012-12-20 18:18     ` Dario Faggioli
2012-12-21 14:29       ` George Dunlap
2012-12-21 16:07         ` Dario Faggioli
2012-12-20 20:21   ` George Dunlap
2012-12-21  0:18     ` Dario Faggioli
2012-12-21 14:56       ` George Dunlap
2012-12-21 16:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 04 of 10 v2] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-12-21 15:17   ` George Dunlap
2012-12-21 16:17     ` Dario Faggioli
2013-01-03 16:05     ` Daniel De Graaf
2012-12-19 19:07 ` [PATCH 05 of 10 v2] libxc: " Dario Faggioli
2012-12-21 15:19   ` George Dunlap
2012-12-21 16:27     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 06 of 10 v2] libxl: " Dario Faggioli
2012-12-21 15:30   ` George Dunlap
2012-12-21 16:18     ` Dario Faggioli
2012-12-21 17:02       ` Ian Jackson
2012-12-21 17:09         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Dario Faggioli
2012-12-20  8:41   ` Ian Campbell
2012-12-20  9:24     ` Dario Faggioli
2012-12-21 16:00   ` George Dunlap
2012-12-21 16:23     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 08 of 10 v2] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-12-21 16:22   ` George Dunlap
2012-12-19 19:07 ` [PATCH 09 of 10 v2] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-12-21 16:34   ` George Dunlap
2012-12-21 16:54     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 10 of 10 v2] docs: rearrange and update NUMA placement documentation Dario Faggioli
2012-12-19 23:16 ` [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2013-01-11 12:19 ` Ian Campbell
2013-01-11 13:57   ` Dario Faggioli
2013-01-11 14:09     ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patchbomb.1355944036@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Marcus.Granado@eu.citrix.com \
    --cc=anil@recoil.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=george.dunlap@eu.citrix.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=msw@amazon.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).