Re: [PATCH v6 00/10] vnuma introduction

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com,
	msw@linux.com, lccycc123@gmail.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, JBeulich@suse.com,
	Elena Ufimtseva <ufimtseva@gmail.com>
Subject: Re: [PATCH v6 00/10] vnuma introduction
Date: Tue, 22 Jul 2014 16:03:44 +0200	[thread overview]
Message-ID: <1406037824.17850.46.camel@Solace> (raw)
In-Reply-To: <20140718114834.GI7142@zion.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 6828 bytes --]

On ven, 2014-07-18 at 12:48 +0100, Wei Liu wrote:
> On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote:
> > On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote:

> > > I've also encountered this. I suspect that even if you disble SMT with
> > > cpuid in config file, the cpu topology in guest might still be wrong.
> > >
> > Can I ask why?
> > 
> 
> Because for a PV guest (currently) the guest kernel sees the real "ID"s
> for a cpu. See those "ID"s I change in my hacky patch.
> 
Right, now I see/remember it. Well, this is, I think, something we
should try to fix _independently_ from vNUMA, isn't it?

I mean, even right now, PV guests see completely random cache-sharing
topology, and that does (at least potentially) affect performance, as
the guest scheduler will make incorrect/inconsistent assumptions.

I'm not sure what the correct fix is. Probably something similar to what
you're doing in your hack... but, indeed, I think we should do something
about this!

> > > What do hwloc-ls and lscpu show? Do you see any weird topology like one
> > > core belongs to one node while three belong to another?
> > >
> > Yep, that would be interesting to see.
> > 
> > >  (I suspect not
> > > because your vcpus are already pinned to a specific node)
> > > 
> > Sorry, I'm not sure I follow here... Are you saying that things probably
> > works ok, but that is (only) because of pinning?
> 
> Yes, given that you derive numa memory allocation from cpu pinning or
> use combination of cpu pinning, vcpu to vnode map and vnode to pnode
> map, in those cases those IDs might reflect the right topology.
> 
Well, pinning does (should?) not always happen, as a consequence of a
virtual topology being used.

So, again, I don't think we should rely on pinning to have a sane and,
more important, consistent SMT and cache sharing topology.

Linux maintainers, any ideas?


BTW, I tried a few examples, on the following host:

root@benny:~# xl info -n
...
nr_cpus                : 8
max_cpu_id             : 15
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 3591
...
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       0        0        0
  2:       1        0        0
  3:       1        0        0
  4:       2        0        0
  5:       2        0        0
  6:       3        0        0
  7:       3        0        0
numa_info              :
node:    memsize    memfree    distances
   0:     34062      31029      10

With the following guest configuration, in terms of vcpu pinning:

1) 2 vCPUs ==> same pCPUs
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
debian.guest.osstest                 9     0    0   -b-       2.7  0
debian.guest.osstest                 9     1    0   -b-       5.2  0
debian.guest.osstest                 9     2    7   -b-       2.4  7
debian.guest.osstest                 9     3    7   -b-       4.4  7

2) no SMT
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                11     0    0   -b-       0.6  0
debian.guest.osstest                11     1    2   -b-       0.4  2
debian.guest.osstest                11     2    4   -b-       1.5  4
debian.guest.osstest                11     3    6   -b-       0.5  6

3) Random
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                12     0    3   -b-       1.6  all
debian.guest.osstest                12     1    1   -b-       1.4  all
debian.guest.osstest                12     2    5   -b-       2.4  all
debian.guest.osstest                12     3    7   -b-       1.5  all

4) yes SMT
root@benny:~# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                14     0    1   -b-       1.0  1
debian.guest.osstest                14     1    2   -b-       1.8  2
debian.guest.osstest                14     2    6   -b-       1.1  6
debian.guest.osstest                14     3    7   -b-       0.8  7

And, in *all* these 4 cases, here's what I see:

root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/core_siblings_list
0-3
0-3
0-3
0-3

root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
0-3
0-3
0-3
0-3

root@debian:~# lstopo
Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#1)
  PU L#2 (P#2)
  PU L#3 (P#3)

root@debian:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Stepping:              3
CPU MHz:               3591.780
BogoMIPS:              7183.56
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if
they were all SMT siblings, within the same core, sharing all cache
levels.

This is not the case for dom0 where (I booted with dom0_max_vcpus=4 on
the xen command line) I see this:

root@benny:~# lstopo
Machine (422MB)
  Socket L#0 + L3 L#0 (8192KB)
    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#1)
    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
      PU L#2 (P#2)
      PU L#3 (P#3)

root@benny:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Stepping:              3
CPU MHz:               3591.780
BogoMIPS:              7183.56
Hypervisor vendor:     Xen
Virtualization type:   none
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

What am I doing wrong, or what am I missing?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2014-07-22 14:03 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18  5:49 [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 01/10] xen: vnuma topology and subop hypercalls Elena Ufimtseva
2014-07-18 10:30   ` Wei Liu
2014-07-20 13:16     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-22 15:18         ` Dario Faggioli
2014-07-23  5:33           ` Elena Ufimtseva
2014-07-18 13:49   ` Konrad Rzeszutek Wilk
2014-07-20 13:26     ` Elena Ufimtseva
2014-07-22 15:14   ` Dario Faggioli
2014-07-23  5:22     ` Elena Ufimtseva
2014-07-23 14:06   ` Jan Beulich
2014-07-25  4:52     ` Elena Ufimtseva
2014-07-25  7:33       ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 02/10] xsm bits for vNUMA hypercalls Elena Ufimtseva
2014-07-18 13:50   ` Konrad Rzeszutek Wilk
2014-07-18 15:26     ` Daniel De Graaf
2014-07-20 13:48       ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 03/10] vnuma hook to debug-keys u Elena Ufimtseva
2014-07-23 14:10   ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 04/10] libxc: Introduce xc_domain_setvnuma to set vNUMA Elena Ufimtseva
2014-07-18 10:33   ` Wei Liu
2014-07-29 10:33   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 05/10] libxl: vnuma topology configuration parser and doc Elena Ufimtseva
2014-07-18 10:53   ` Wei Liu
2014-07-20 14:04     ` Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-29 10:42   ` Ian Campbell
2014-08-06  4:46     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 06/10] libxc: move code to arch_boot_alloc func Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 07/10] libxc: allocate domain memory for vnuma enabled Elena Ufimtseva
2014-07-29 10:43   ` Ian Campbell
2014-08-06  4:48     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 08/10] libxl: build numa nodes memory blocks Elena Ufimtseva
2014-07-18 11:01   ` Wei Liu
2014-07-20 12:58     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-18  5:50 ` [PATCH v6 09/10] libxl: vnuma nodes placement bits Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 10/10] libxl: set vnuma for domain Elena Ufimtseva
2014-07-18 10:58   ` Wei Liu
2014-07-29 10:45   ` Ian Campbell
2014-08-12  3:52     ` Elena Ufimtseva
2014-08-12  9:42       ` Wei Liu
2014-08-12 17:10         ` Dario Faggioli
2014-08-12 17:13           ` Wei Liu
2014-08-12 17:24             ` Elena Ufimtseva
2014-07-18  6:16 ` [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  9:53 ` Wei Liu
2014-07-18 10:13   ` Dario Faggioli
2014-07-18 11:48     ` Wei Liu
2014-07-20 14:57       ` Elena Ufimtseva
2014-07-22 15:49         ` Dario Faggioli
2014-07-22 14:03       ` Dario Faggioli [this message]
2014-07-22 14:48         ` Wei Liu
2014-07-22 15:06           ` Dario Faggioli
2014-07-22 16:47             ` Wei Liu
2014-07-22 19:43         ` Is: cpuid creation of PV guests is not correct. Was:Re: " Konrad Rzeszutek Wilk
2014-07-22 22:34           ` Is: cpuid creation of PV guests is not correct Andrew Cooper
2014-07-22 22:53           ` Is: cpuid creation of PV guests is not correct. Was:Re: [PATCH v6 00/10] vnuma introduction Dario Faggioli
2014-07-23  6:00             ` Elena Ufimtseva
2014-07-22 12:49 ` Dario Faggioli
2014-07-23  5:59   ` Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1406037824.17850.46.camel@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=lccycc123@gmail.com \
    --cc=msw@linux.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=ufimtseva@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).