Re: [PATCH v6 00/10] vnuma introduction

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com,
	msw@linux.com, lccycc123@gmail.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, JBeulich@suse.com,
	Elena Ufimtseva <ufimtseva@gmail.com>
Subject: Re: [PATCH v6 00/10] vnuma introduction
Date: Tue, 22 Jul 2014 16:03:44 +0200	[thread overview]
Message-ID: <1406037824.17850.46.camel@Solace> (raw)
In-Reply-To: <20140718114834.GI7142@zion.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 6828 bytes --]

On ven, 2014-07-18 at 12:48 +0100, Wei Liu wrote:
> On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote:
> > On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote:

> > > I've also encountered this. I suspect that even if you disble SMT with
> > > cpuid in config file, the cpu topology in guest might still be wrong.
> > >
> > Can I ask why?
> > 
> 
> Because for a PV guest (currently) the guest kernel sees the real "ID"s
> for a cpu. See those "ID"s I change in my hacky patch.
> 
Right, now I see/remember it. Well, this is, I think, something we
should try to fix _independently_ from vNUMA, isn't it?

I mean, even right now, PV guests see completely random cache-sharing
topology, and that does (at least potentially) affect performance, as
the guest scheduler will make incorrect/inconsistent assumptions.

I'm not sure what the correct fix is. Probably something similar to what
you're doing in your hack... but, indeed, I think we should do something
about this!

> > > What do hwloc-ls and lscpu show? Do you see any weird topology like one
> > > core belongs to one node while three belong to another?
> > >
> > Yep, that would be interesting to see.
> > 
> > >  (I suspect not
> > > because your vcpus are already pinned to a specific node)
> > > 
> > Sorry, I'm not sure I follow here... Are you saying that things probably
> > works ok, but that is (only) because of pinning?
> 
> Yes, given that you derive numa memory allocation from cpu pinning or
> use combination of cpu pinning, vcpu to vnode map and vnode to pnode
> map, in those cases those IDs might reflect the right topology.
> 
Well, pinning does (should?) not always happen, as a consequence of a
virtual topology being used.

So, again, I don't think we should rely on pinning to have a sane and,
more important, consistent SMT and cache sharing topology.

Linux maintainers, any ideas?


BTW, I tried a few examples, on the following host:

root@benny:~# xl info -n
...
nr_cpus                : 8
max_cpu_id             : 15
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 3591
...
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       0        0        0
  2:       1        0        0
  3:       1        0        0
  4:       2        0        0
  5:       2        0        0
  6:       3        0        0
  7:       3        0        0
numa_info              :
node:    memsize    memfree    distances
   0:     34062      31029      10

With the following guest configuration, in terms of vcpu pinning:

1) 2 vCPUs ==> same pCPUs
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
debian.guest.osstest                 9     0    0   -b-       2.7  0
debian.guest.osstest                 9     1    0   -b-       5.2  0
debian.guest.osstest                 9     2    7   -b-       2.4  7
debian.guest.osstest                 9     3    7   -b-       4.4  7

2) no SMT
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                11     0    0   -b-       0.6  0
debian.guest.osstest                11     1    2   -b-       0.4  2
debian.guest.osstest                11     2    4   -b-       1.5  4
debian.guest.osstest                11     3    6   -b-       0.5  6

3) Random
root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                12     0    3   -b-       1.6  all
debian.guest.osstest                12     1    1   -b-       1.4  all
debian.guest.osstest                12     2    5   -b-       2.4  all
debian.guest.osstest                12     3    7   -b-       1.5  all

4) yes SMT
root@benny:~# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
debian.guest.osstest                14     0    1   -b-       1.0  1
debian.guest.osstest                14     1    2   -b-       1.8  2
debian.guest.osstest                14     2    6   -b-       1.1  6
debian.guest.osstest                14     3    7   -b-       0.8  7

And, in *all* these 4 cases, here's what I see:

root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/core_siblings_list
0-3
0-3
0-3
0-3

root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
0-3
0-3
0-3
0-3

root@debian:~# lstopo
Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#1)
  PU L#2 (P#2)
  PU L#3 (P#3)

root@debian:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Stepping:              3
CPU MHz:               3591.780
BogoMIPS:              7183.56
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if
they were all SMT siblings, within the same core, sharing all cache
levels.

This is not the case for dom0 where (I booted with dom0_max_vcpus=4 on
the xen command line) I see this:

root@benny:~# lstopo
Machine (422MB)
  Socket L#0 + L3 L#0 (8192KB)
    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#1)
    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
      PU L#2 (P#2)
      PU L#3 (P#3)

root@benny:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Stepping:              3
CPU MHz:               3591.780
BogoMIPS:              7183.56
Hypervisor vendor:     Xen
Virtualization type:   none
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

What am I doing wrong, or what am I missing?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2014-07-22 14:03 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18  5:49 [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 01/10] xen: vnuma topology and subop hypercalls Elena Ufimtseva
2014-07-18 10:30   ` Wei Liu
2014-07-20 13:16     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-22 15:18         ` Dario Faggioli
2014-07-23  5:33           ` Elena Ufimtseva
2014-07-18 13:49   ` Konrad Rzeszutek Wilk
2014-07-20 13:26     ` Elena Ufimtseva
2014-07-22 15:14   ` Dario Faggioli
2014-07-23  5:22     ` Elena Ufimtseva
2014-07-23 14:06   ` Jan Beulich
2014-07-25  4:52     ` Elena Ufimtseva
2014-07-25  7:33       ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 02/10] xsm bits for vNUMA hypercalls Elena Ufimtseva
2014-07-18 13:50   ` Konrad Rzeszutek Wilk
2014-07-18 15:26     ` Daniel De Graaf
2014-07-20 13:48       ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 03/10] vnuma hook to debug-keys u Elena Ufimtseva
2014-07-23 14:10   ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 04/10] libxc: Introduce xc_domain_setvnuma to set vNUMA Elena Ufimtseva
2014-07-18 10:33   ` Wei Liu
2014-07-29 10:33   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 05/10] libxl: vnuma topology configuration parser and doc Elena Ufimtseva
2014-07-18 10:53   ` Wei Liu
2014-07-20 14:04     ` Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-29 10:42   ` Ian Campbell
2014-08-06  4:46     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 06/10] libxc: move code to arch_boot_alloc func Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 07/10] libxc: allocate domain memory for vnuma enabled Elena Ufimtseva
2014-07-29 10:43   ` Ian Campbell
2014-08-06  4:48     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 08/10] libxl: build numa nodes memory blocks Elena Ufimtseva
2014-07-18 11:01   ` Wei Liu
2014-07-20 12:58     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-18  5:50 ` [PATCH v6 09/10] libxl: vnuma nodes placement bits Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 10/10] libxl: set vnuma for domain Elena Ufimtseva
2014-07-18 10:58   ` Wei Liu
2014-07-29 10:45   ` Ian Campbell
2014-08-12  3:52     ` Elena Ufimtseva
2014-08-12  9:42       ` Wei Liu
2014-08-12 17:10         ` Dario Faggioli
2014-08-12 17:13           ` Wei Liu
2014-08-12 17:24             ` Elena Ufimtseva
2014-07-18  6:16 ` [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  9:53 ` Wei Liu
2014-07-18 10:13   ` Dario Faggioli
2014-07-18 11:48     ` Wei Liu
2014-07-20 14:57       ` Elena Ufimtseva
2014-07-22 15:49         ` Dario Faggioli
2014-07-22 14:03       ` Dario Faggioli [this message]
2014-07-22 14:48         ` Wei Liu
2014-07-22 15:06           ` Dario Faggioli
2014-07-22 16:47             ` Wei Liu
2014-07-22 19:43         ` Is: cpuid creation of PV guests is not correct. Was:Re: " Konrad Rzeszutek Wilk
2014-07-22 22:34           ` Is: cpuid creation of PV guests is not correct Andrew Cooper
2014-07-22 22:53           ` Is: cpuid creation of PV guests is not correct. Was:Re: [PATCH v6 00/10] vnuma introduction Dario Faggioli
2014-07-23  6:00             ` Elena Ufimtseva
2014-07-22 12:49 ` Dario Faggioli
2014-07-23  5:59   ` Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1406037824.17850.46.camel@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=lccycc123@gmail.com \
    --cc=msw@linux.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=ufimtseva@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.