xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00 of 10 v2] NUMA aware credit scheduling
@ 2012-12-19 19:07 Dario Faggioli
  2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
                   ` (11 more replies)
  0 siblings, 12 replies; 57+ messages in thread
From: Dario Faggioli @ 2012-12-19 19:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcus Granado, Dan Magenheimer, Ian Campbell, Anil Madhavapeddy,
	George Dunlap, Andrew Cooper, Juergen Gross, Ian Jackson,
	Jan Beulich, Daniel De Graaf, Matt Wilson

Hello Everyone,

Here it is the take 2 of the NUMA aware credit scheduling series. Sorry it took
a bit, but I had to take care of those nasty bugs causing scheduling anomalies,
as they were getting in the way and messing up the numbers when trying to
evaluate performances of this! :-)

I also rewrote most of the core of the two step vcpu and node affinity
balancing algorithm, as per George's suggestion during last round, to try to
squeeze a little bit more of performances improvement.

As already and repeatedly said, what the series does is providing the (credit)
scheduler with the knowledge of a domain's node-affinity. It will then always
try to run domain's vCPUs on one of those nodes first. Only if that turns out
to be impossible, it falls back to the old behaviour. (BTW, for any update on
the status of my "quest" about improving NUMA support in Xen, look
http://wiki.xen.org/wiki/Xen_NUMA_Roadmap.)

I rerun my usual benchmark, SpecJBB2005, plus some others, i.e., some
configurations of sysbench and lmbench. A little bit more about them follows:

 * SpecJBB is all about throughput, so pinning is likely the ideal solution.

 * Sysbench-memory is the time it takes for writing a fixed amount of memory
   (and then it is the throughput that is measured). What we expect is
   locality to be important, but at the same time the potential imbalances
   due to pinning could have a say in it.

 * Lmbench-proc is the time it takes for a process to fork a fixed amount of
   children. This is much more about latency than throughput, with locality
   of memory accesses playing a smaller role and, again, imbalances due to
   pinning being a potential issue.

On a 2 nodes, 16 cores system, where I can have 2 to 10 VMs (2 vCPUs each)
executing the benchmarks concurrently, here's what I get:

 ----------------------------------------------------
 | SpecJBB2005, throughput (the higher the better)  |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |   43451.853 | 49876.750 |    49693.653    |
 |    6 |   29368.589 | 33782.132 |    33692.936    |
 |   10 |   19138.934 | 21950.696 |    21413.311    |
 ----------------------------------------------------
 | Sysbench memory, throughput (the higher the better)
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  484.42167  | 552.32667 |    552.86167    |
 |    6 |  404.43667  | 440.00056 |    449.42611    |
 |   10 |  296.45600  | 315.51733 |    331.49067    |
 ----------------------------------------------------
 | LMBench proc, latency (the lower the better)     |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 ----------------------------------------------------
 |    2 |  824.00437  | 749.51892 |    741.42952    |
 |    6 |  942.39442  | 985.02761 |    974.94700    |
 |   10 |  1254.3121  | 1363.0792 |    1301.2917    |
 ----------------------------------------------------

Which, reasoning in terms of %-performances increase/decrease, means NUMA aware
scheduling does as follows, as compared to no affinity at all and to pinning:

     ----------------------------------
     | SpecJBB2005 (throughput)       |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +14.36%   |   -0.36%  |
     |    6 |   +14.72%   |   -0.26%  |
     |   10 |   +11.88%   |   -2.44%  |
     ----------------------------------
     | Sysbench memory (throughput)   |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +14.12%   |   +0.09%  |
     |    6 |   +11.12%   |   +2.14%  |
     |   10 |   +11.81%   |   +5.06%  |
     ----------------------------------
     | LMBench proc (latency)         |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     ----------------------------------
     |    2 |   +10.02%   |   +1.07%  |
     |    6 |    +3.45%   |   +1.02%  |
     |   10 |    +2.94%   |   +4.53%  |
     ----------------------------------

Numbers seem to tell we're being successful in taking advantage of both the
improved locality (when compared to no affinity) and the greater flexibility
the NUMA aware scheduling approach gives us (when compared to pinning).  In
fact, when throughput only is concerned (SpecJBB case), it behaves almost on
par with pinning, and a lot better than no affinity at all. Moreover, we're
even able to do better than them both, when latency comes a little bit more
into the game and the imbalances caused by pinning would make things worse than
not having any affinity, like in the sysbench and, especially, in the LMBench
case.

Here are the patches included in the series. I '*'-ed ones already received one
or more acks during v1.  However, there are patches that were significantly
reworked since then. In that case, I just ignored that, and left them with my
SOB only, as I think they definitely need to be re-reviewd. :-)

 * [ 1/10] xen, libxc: rename xenctl_cpumap to xenctl_bitmap
 * [ 2/10] xen, libxc: introduce node maps and masks
   [ 3/10] xen: sched_credit: let the scheduler know about node-affinity
   [ 4/10] xen: allow for explicitly specifying node-affinity
 * [ 5/10] libxc: allow for explicitly specifying node-affinity
 * [ 6/10] libxl: allow for explicitly specifying node-affinity
   [ 7/10] libxl: optimize the calculation of how many VCPUs can run on a candidate
 * [ 8/10] libxl: automatic placement deals with node-affinity
 * [ 9/10] xl: add node-affinity to the output of `xl list`
   [10/10] docs: rearrange and update NUMA placement documentation

Thanks and Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2013-01-11 14:09 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 19:07 [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2012-12-19 19:07 ` [PATCH 01 of 10 v2] xen, libxc: rename xenctl_cpumap to xenctl_bitmap Dario Faggioli
2012-12-20  9:17   ` Jan Beulich
2012-12-20  9:35     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 02 of 10 v2] xen, libxc: introduce node maps and masks Dario Faggioli
2012-12-20  9:18   ` Jan Beulich
2012-12-20  9:55     ` Dario Faggioli
2012-12-20 14:33     ` George Dunlap
2012-12-20 14:52       ` Jan Beulich
2012-12-20 15:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 03 of 10 v2] xen: sched_credit: let the scheduler know about node-affinity Dario Faggioli
2012-12-20  6:44   ` Juergen Gross
2012-12-20  8:16     ` Dario Faggioli
2012-12-20  8:25       ` Juergen Gross
2012-12-20  8:33         ` Dario Faggioli
2012-12-20  8:39           ` Juergen Gross
2012-12-20  8:58             ` Dario Faggioli
2012-12-20 15:28             ` George Dunlap
2012-12-20 16:00               ` Dario Faggioli
2012-12-20  9:22           ` Jan Beulich
2012-12-20 15:56   ` George Dunlap
2012-12-20 17:12     ` Dario Faggioli
2012-12-20 16:48   ` George Dunlap
2012-12-20 18:18     ` Dario Faggioli
2012-12-21 14:29       ` George Dunlap
2012-12-21 16:07         ` Dario Faggioli
2012-12-20 20:21   ` George Dunlap
2012-12-21  0:18     ` Dario Faggioli
2012-12-21 14:56       ` George Dunlap
2012-12-21 16:13         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 04 of 10 v2] xen: allow for explicitly specifying node-affinity Dario Faggioli
2012-12-21 15:17   ` George Dunlap
2012-12-21 16:17     ` Dario Faggioli
2013-01-03 16:05     ` Daniel De Graaf
2012-12-19 19:07 ` [PATCH 05 of 10 v2] libxc: " Dario Faggioli
2012-12-21 15:19   ` George Dunlap
2012-12-21 16:27     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 06 of 10 v2] libxl: " Dario Faggioli
2012-12-21 15:30   ` George Dunlap
2012-12-21 16:18     ` Dario Faggioli
2012-12-21 17:02       ` Ian Jackson
2012-12-21 17:09         ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 07 of 10 v2] libxl: optimize the calculation of how many VCPUs can run on a candidate Dario Faggioli
2012-12-20  8:41   ` Ian Campbell
2012-12-20  9:24     ` Dario Faggioli
2012-12-21 16:00   ` George Dunlap
2012-12-21 16:23     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 08 of 10 v2] libxl: automatic placement deals with node-affinity Dario Faggioli
2012-12-21 16:22   ` George Dunlap
2012-12-19 19:07 ` [PATCH 09 of 10 v2] xl: add node-affinity to the output of `xl list` Dario Faggioli
2012-12-21 16:34   ` George Dunlap
2012-12-21 16:54     ` Dario Faggioli
2012-12-19 19:07 ` [PATCH 10 of 10 v2] docs: rearrange and update NUMA placement documentation Dario Faggioli
2012-12-19 23:16 ` [PATCH 00 of 10 v2] NUMA aware credit scheduling Dario Faggioli
2013-01-11 12:19 ` Ian Campbell
2013-01-11 13:57   ` Dario Faggioli
2013-01-11 14:09     ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).