xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Juergen Gross <jgross@suse.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Tue, 28 Jul 2015 18:17:18 +0200	[thread overview]
Message-ID: <1438100238.2889.135.camel@citrix.com> (raw)
In-Reply-To: <55B79B9C.6030505@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 5849 bytes --]

On Tue, 2015-07-28 at 17:11 +0200, Juergen Gross wrote:
> On 07/28/2015 06:29 AM, Juergen Gross wrote:

> > I'll make some performance tests on a big machine (4 sockets, 60 cores,
> > 120 threads) regarding topology information:
> >
> > - bare metal
> > - "random" topology (like today)
> > - "simple" topology (all vcpus regarded as equal)
> > - "real" topology with all vcpus pinned
> >
> > This should show:
> >
> > - how intrusive would the topology patch(es) be?
> > - what is the performance impact of a "wrong" scheduling data base
> 
> On the above box I used a pvops kernel 4.2-rc4 plus a rather small patch
> (see attachment). I did 5 kernel builds in each environment:
> 
> make clean
> time make -j 120
> 
Right. If you have time, can you try '-j60' and '-j30' (maybe even -j45
and -j15, if you've got _a_lot_ of time! :-)).

I'm asking this because, with hyperthreading involved, I've sometimes
seen things being the worse when *not* (over)saturating the CPU
capacity.

The explanation is that, if every vcpu is busy, meaning that every
thread is busy, it does not make much difference where you schedule the
busy vcpus.

OTOH, if only 1/2 of the threads are busy, a properly setup system will
effectively spread the load in such a way that each vcpu has a full core
available; a messed up one will, when trying to do the same, end up
scheduling stuff on siblings, even if there are idle cores available.

In this case, things are a bit more tricky. In fact, I've observed the
above while looking after the Xen scheduler. In this case, it is the
guest (dom0) scheduler that we are looking at, and, e.g., if the load is
small enough, Xen's scheduler will fix things up, at least up to a
certain extent.

It's worth a try anyway, I guess, if you have time, of course.

> The first result of the 5 runs was always omitted as it would have to
> build up buffer caches etc. The Xen cases were all done in dom0, pinning
> of vcpus in the last scenario was done via dom0_vcpus_pin boot parameter
> of the hypervisor.
> 
> Here are the results (everything in seconds):
> 
>                      elapsed   user   system
> bare metal:            100    5770      805
> "random" topology:     283    6740    20700
> "simple" topology:     290    6740    22200
> "real" topology:       185    7800     8040
> 
> As expected bare metal is the best. Next is "real" topology with pinned
> vcpus (expected again - but system time already factor of 10 up!).
>
I also think that (massively) overloading biases things in favour of
pinning. In fact, pinning incurs in less overhead, as there are no
scheduling decisions involved, and no migrations of vcpus among pcpus.
With the system oversubscribed to to 200%, even in the non-pinning case
there shouldn't be much migrations, but certainly there will be some,
and they turn out to be pure overhead! In fact, they bring zero
benefits, as it's not possible that any of them will put the system in a
more advantageous state, performance wise: we're fully loaded and we
want to stay fully loaded!

> What I didn't expect is: "random" is better than "simple" topology. 
>
Weird indeed!

> I
> could test some other topologies (e.g. everything on one socket, or even
> on one core), but I'm not sure this makes sense. I didn't check the
> exact topology result of the "random" case, maybe I'll do that tomorrow
> with another measurement.
> 
So, my test box looks like this:
cpu_topology           :
cpu:    core    socket     node
  0:       0        1        0
  1:       0        1        0
  2:       1        1        0
  3:       1        1        0
  4:       9        1        0
  5:       9        1        0
  6:      10        1        0
  7:      10        1        0
  8:       0        0        1
  9:       0        0        1
 10:       1        0        1
 11:       1        0        1
 12:       9        0        1
 13:       9        0        1
 14:      10        0        1
 15:      10        0        1

In Dom0, here's what I see _without_ any pinning:

root@Zhaman:~# for i in `seq 0 15`;do cat /sys/devices/system/cpu/cpu$i/topology/thread_siblings_list ;done
0-1
0-1
2-3
2-3
4-5
4-5
6-7
6-7
8-9
8-9
10-11
10-11
12-13
12-13
14-15
14-15

root@Zhaman:~# cat /proc/cpuinfo |grep "physical id"
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 1
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0

root@Zhaman:~# cat /proc/cpuinfo |grep "core id"
core id		: 0
core id		: 0
core id		: 1
core id		: 1
core id		: 9
core id		: 9
core id		: 10
core id		: 10
core id		: 0
core id		: 0
core id		: 1
core id		: 1
core id		: 9
core id		: 9
core id		: 10
core id		: 10

root@Zhaman:~# cat /proc/cpuinfo |grep "cpu cores"
cpu cores	: 4
<same for all cpus>

root@Zhaman:~# cat /proc/cpuinfo |grep "siblings" 
siblings	: 8
<same for all cpus>

So, basically, as far as Dom0 on my test box is concerned, "random"
actually matches the host topology.

Sure, without pinning, this looks equally wrong, as Xen's scheduler can
well execute, say, vcpu 0 and vcpu 4, which are not siblings, on the
same core. But then again, if the load is small, it just won't happen
(e.g., if there are only those two busy vcpus, Xen will send them on
!siblings core), while if it's too hugh, it won't matter... :-/

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-07-28 16:17 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16 10:32 PV-vNUMA issue: topology is misinterpreted by the guest Dario Faggioli
2015-07-16 10:47 ` Jan Beulich
2015-07-16 10:56   ` Andrew Cooper
2015-07-16 15:25     ` Wei Liu
2015-07-16 15:45       ` Andrew Cooper
2015-07-16 15:50         ` Boris Ostrovsky
2015-07-16 16:29           ` Jan Beulich
2015-07-16 16:39             ` Andrew Cooper
2015-07-16 16:59               ` Boris Ostrovsky
2015-07-17  6:09                 ` Jan Beulich
2015-07-17  7:27                   ` Dario Faggioli
2015-07-17  7:42                     ` Jan Beulich
2015-07-17  8:44                     ` Wei Liu
2015-07-17 18:17                     ` Boris Ostrovsky
2015-07-20 14:09                       ` Dario Faggioli
2015-07-20 14:43                         ` Boris Ostrovsky
2015-07-21 20:00                           ` Boris Ostrovsky
2015-07-22 13:36                             ` Dario Faggioli
2015-07-22 13:50                               ` Juergen Gross
2015-07-22 13:58                                 ` Boris Ostrovsky
2015-07-22 14:09                                   ` Juergen Gross
2015-07-22 14:44                                     ` Boris Ostrovsky
2015-07-23  4:43                                       ` Juergen Gross
2015-07-23  7:28                                         ` Jan Beulich
2015-07-23  9:42                                         ` Andrew Cooper
2015-07-23 14:07                                         ` Dario Faggioli
2015-07-23 14:13                                           ` Juergen Gross
2015-07-24 10:28                                           ` Juergen Gross
2015-07-24 14:44                                             ` Dario Faggioli
2015-07-24 15:14                                               ` Juergen Gross
2015-07-24 15:24                                                 ` Juergen Gross
2015-07-24 15:58                                                   ` Dario Faggioli
2015-07-24 16:09                                                     ` Konrad Rzeszutek Wilk
2015-07-24 16:14                                                       ` Dario Faggioli
2015-07-24 16:18                                                       ` Juergen Gross
2015-07-24 16:29                                                         ` Konrad Rzeszutek Wilk
2015-07-24 16:39                                                           ` Juergen Gross
2015-07-24 16:44                                                             ` Boris Ostrovsky
2015-07-27  4:35                                                               ` Juergen Gross
2015-07-27 10:43                                                                 ` George Dunlap
2015-07-27 10:54                                                                   ` Andrew Cooper
2015-07-27 11:13                                                                     ` Juergen Gross
2015-07-27 10:54                                                                   ` Juergen Gross
2015-07-27 11:11                                                                     ` George Dunlap
2015-07-27 12:01                                                                       ` Juergen Gross
2015-07-27 12:16                                                                         ` Tim Deegan
2015-07-27 13:23                                                                         ` Dario Faggioli
2015-07-27 14:02                                                                           ` Juergen Gross
2015-07-27 14:02                                                                       ` Dario Faggioli
2015-07-27 10:41                                                       ` George Dunlap
2015-07-27 10:49                                                         ` Andrew Cooper
2015-07-27 13:11                                                           ` Dario Faggioli
2015-07-24 16:10                                                     ` Juergen Gross
2015-07-24 16:40                                                       ` Boris Ostrovsky
2015-07-24 16:48                                                         ` Juergen Gross
2015-07-24 17:11                                                           ` Boris Ostrovsky
2015-07-27 13:40                                                             ` Dario Faggioli
2015-07-27  4:24                                                         ` Juergen Gross
2015-07-27 14:09                                                       ` Dario Faggioli
2015-07-27 14:34                                                         ` Boris Ostrovsky
2015-07-27 14:43                                                           ` Juergen Gross
2015-07-27 14:51                                                             ` Boris Ostrovsky
2015-07-27 15:03                                                               ` Juergen Gross
2015-07-27 14:47                                                           ` Juergen Gross
2015-07-27 14:58                                                           ` Dario Faggioli
2015-07-28  4:29                                                         ` Juergen Gross
2015-07-28 15:11                                                           ` Juergen Gross
2015-07-28 16:17                                                             ` Dario Faggioli [this message]
2015-07-28 17:13                                                               ` Dario Faggioli
2015-07-29  6:04                                                               ` Juergen Gross
2015-07-29  7:09                                                                 ` Dario Faggioli
2015-07-29  7:44                                                             ` Dario Faggioli
2015-07-24 16:05                                                 ` Dario Faggioli
2015-07-28 10:05                                                   ` Wei Liu
2015-07-28 15:17                                                     ` Dario Faggioli
2015-07-24 20:27                                               ` Elena Ufimtseva
2015-07-22 14:50                                     ` Dario Faggioli
2015-07-22 15:32                                       ` Boris Ostrovsky
2015-07-22 15:49                                         ` Dario Faggioli
2015-07-22 18:10                                           ` Boris Ostrovsky
2015-07-23  7:25                                             ` Jan Beulich
2015-07-24 16:03                                               ` Boris Ostrovsky
2015-07-23 13:46                                             ` Dario Faggioli
2015-07-17 10:17                 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02   ` Dario Faggioli
2015-07-27 16:31     ` David Vrabel
2015-07-27 16:33       ` Andrew Cooper
2015-07-27 17:42         ` Dario Faggioli
2015-07-27 17:50           ` Konrad Rzeszutek Wilk
2015-07-27 23:19           ` Andrew Cooper
2015-07-28  3:52             ` Juergen Gross
2015-07-28  9:40               ` Andrew Cooper
2015-07-28  9:28             ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438100238.2889.135.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jgross@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).