From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
Wei Liu <Wei.Liu2@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Tue, 28 Jul 2015 00:19:09 +0100 [thread overview]
Message-ID: <55B6BC6D.8020808@citrix.com> (raw)
In-Reply-To: <1438018925.5036.242.camel@citrix.com>
[-- Attachment #1.1: Type: text/plain, Size: 5487 bytes --]
On 27/07/2015 18:42, Dario Faggioli wrote:
> On Mon, 2015-07-27 at 17:33 +0100, Andrew Cooper wrote: >> On 27/07/15 17:31, David Vrabel wrote: >>> >>>> Yeah, indeed. That's
the downside of Juergen's "Linux scheduler >>>> approach". But the issue
is there, even without taking vNUMA into >>>> account, and I think
something like that would really help (only for >>>> Dom0, and Linux
guests, of course). >>> I disagree. Whether we're using vNUMA or not,
Xen should still ensure >>> that the guest kernel and userspace see a
consistent and correct >>> topology using the native mechanisms. >> >>
+1 >> > +1 from me as well. In fact, a mechanism for making exactly such
thing > happen, was what I was after when starting the thread. > > Then
it came up that CPUID needs to be used for at least two different > and
potentially conflicting purposes, that we want to support both and >
that, whether and for whatever reason it's used, Linux configures its >
scheduler after it, potentially resulting in rather pathological setups.
I don't see what the problem is here. Fundamentally, "NUMA optimise" vs
"comply with licence" is a user/admin decision at boot time, and we need
not cater to both halves at the same time.
Supporting either, as chosen by the admin, is worthwhile.
> > > It's at that point that some decoupling started to appear >
interesting... :-P > > Also, are we really being consistent? If my
methodology is correct > (which might not be, please, double check, and
sorry for that), I'm > seeing quite some inconsistency around: > > HOST:
> root@Zhaman:~# xl info -n > ... > cpu_topology : >
cpu: core socket node > 0: 0 1 0 >
1: 0 1 0 > 2: 1 1 0 >
3: 1 1 0 > 4: 9 1 0 >
5: 9 1 0 > 6: 10 1 0 >
7: 10 1 0 > 8: 0 0 1 >
9: 0 0 1 > 10: 1 0 1 >
11: 1 0 1 > 12: 9 0 1 >
13: 9 0 1 > 14: 10 0 1 >
15: 10 0 1
o_O
What kind of system results in this layout? Can you dump the ACPI
tables and make them available?
> > ... > root@Zhaman:~# xl vcpu-list test >
Name ID VCPU CPU State Time(s)
Affinity (Hard / Soft) > test 2
0 0 r-- 1.5 0 / all > test
2 1 1 r-- 0.2 1 / all >
test 2 2 8 -b- 2.2 8 /
all > test 2 3 9 -b-
2.0 9 / all > > GUEST (HVM, 4 vcpus): > root@test:~# cpuid|grep
CORE_ID > (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 > (APIC
synth): PKG_ID=0 CORE_ID=16 SMT_ID=1 > (APIC synth): PKG_ID=0
CORE_ID=0 SMT_ID=0 > (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=1 > >
HOST: > root@Zhaman:~# xl vcpu-pin 2 all 0 > root@Zhaman:~# xl
vcpu-list 2 > Name ID VCPU CPU
State Time(s) Affinity (Hard / Soft) >
test 2 0 0 -b- 43.7 0 /
all > test 2 1 0 -b-
38.4 0 / all > test 2 2 0
-b- 36.9 0 / all > test 2
3 0 -b- 38.8 0 / all > > GUEST: > root@test:~# cpuid|grep
CORE_ID > (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 > (APIC
synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 > (APIC synth): PKG_ID=0
CORE_ID=16 SMT_ID=0 > (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 > >
HOST: > root@Zhaman:~# xl vcpu-pin 2 0 7 > root@Zhaman:~# xl vcpu-pin
2 1 7 > root@Zhaman:~# xl vcpu-pin 2 2 15 > root@Zhaman:~# xl vcpu-pin
2 3 15 > root@Zhaman:~# xl vcpu-list 2 >
Name ID VCPU CPU State Time(s)
Affinity (Hard / Soft) > test 2
0 7 -b- 44.3 7 / all > test
2 1 7 -b- 38.9 7 / all >
test 2 2 15 -b- 37.3 15 /
all > test 2 3 15 -b-
39.2 15 / all > > GUEST: > root@test:~# cpuid|grep CORE_ID > (APIC
synth): PKG_ID=0 CORE_ID=26 SMT_ID=1 > (APIC synth): PKG_ID=0
CORE_ID=26 SMT_ID=1 > (APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1 >
(APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1 > > So, it looks to me that:
> 1) any application using CPUID for either licensing or >
placement/performance optimization will get (potentially) random >
results; > 2) whatever set of values the kernel used, during guest
boot, to build > up its internal scheduling data structures, has no
guarantee of > being related to any value returned by CPUID, at a
later point. > > Hence, I think I'm seeing inconsistency between kernel
and userspace > (and between userspace and itself, over time) already...
Am I > overlooking something?
All current CPUID values presented to guests are about as reliable as
being picked from /dev/urandom. (This isn't strictly true - the feature
flags will be in the right ballpark if the VM has not migrated yet).
Fixing this (as described in my feature levelling design document) is
sufficiently non-trivial that it has been deferred to post
feature-levelling work.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 7657 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-07-27 23:19 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-16 10:32 PV-vNUMA issue: topology is misinterpreted by the guest Dario Faggioli
2015-07-16 10:47 ` Jan Beulich
2015-07-16 10:56 ` Andrew Cooper
2015-07-16 15:25 ` Wei Liu
2015-07-16 15:45 ` Andrew Cooper
2015-07-16 15:50 ` Boris Ostrovsky
2015-07-16 16:29 ` Jan Beulich
2015-07-16 16:39 ` Andrew Cooper
2015-07-16 16:59 ` Boris Ostrovsky
2015-07-17 6:09 ` Jan Beulich
2015-07-17 7:27 ` Dario Faggioli
2015-07-17 7:42 ` Jan Beulich
2015-07-17 8:44 ` Wei Liu
2015-07-17 18:17 ` Boris Ostrovsky
2015-07-20 14:09 ` Dario Faggioli
2015-07-20 14:43 ` Boris Ostrovsky
2015-07-21 20:00 ` Boris Ostrovsky
2015-07-22 13:36 ` Dario Faggioli
2015-07-22 13:50 ` Juergen Gross
2015-07-22 13:58 ` Boris Ostrovsky
2015-07-22 14:09 ` Juergen Gross
2015-07-22 14:44 ` Boris Ostrovsky
2015-07-23 4:43 ` Juergen Gross
2015-07-23 7:28 ` Jan Beulich
2015-07-23 9:42 ` Andrew Cooper
2015-07-23 14:07 ` Dario Faggioli
2015-07-23 14:13 ` Juergen Gross
2015-07-24 10:28 ` Juergen Gross
2015-07-24 14:44 ` Dario Faggioli
2015-07-24 15:14 ` Juergen Gross
2015-07-24 15:24 ` Juergen Gross
2015-07-24 15:58 ` Dario Faggioli
2015-07-24 16:09 ` Konrad Rzeszutek Wilk
2015-07-24 16:14 ` Dario Faggioli
2015-07-24 16:18 ` Juergen Gross
2015-07-24 16:29 ` Konrad Rzeszutek Wilk
2015-07-24 16:39 ` Juergen Gross
2015-07-24 16:44 ` Boris Ostrovsky
2015-07-27 4:35 ` Juergen Gross
2015-07-27 10:43 ` George Dunlap
2015-07-27 10:54 ` Andrew Cooper
2015-07-27 11:13 ` Juergen Gross
2015-07-27 10:54 ` Juergen Gross
2015-07-27 11:11 ` George Dunlap
2015-07-27 12:01 ` Juergen Gross
2015-07-27 12:16 ` Tim Deegan
2015-07-27 13:23 ` Dario Faggioli
2015-07-27 14:02 ` Juergen Gross
2015-07-27 14:02 ` Dario Faggioli
2015-07-27 10:41 ` George Dunlap
2015-07-27 10:49 ` Andrew Cooper
2015-07-27 13:11 ` Dario Faggioli
2015-07-24 16:10 ` Juergen Gross
2015-07-24 16:40 ` Boris Ostrovsky
2015-07-24 16:48 ` Juergen Gross
2015-07-24 17:11 ` Boris Ostrovsky
2015-07-27 13:40 ` Dario Faggioli
2015-07-27 4:24 ` Juergen Gross
2015-07-27 14:09 ` Dario Faggioli
2015-07-27 14:34 ` Boris Ostrovsky
2015-07-27 14:43 ` Juergen Gross
2015-07-27 14:51 ` Boris Ostrovsky
2015-07-27 15:03 ` Juergen Gross
2015-07-27 14:47 ` Juergen Gross
2015-07-27 14:58 ` Dario Faggioli
2015-07-28 4:29 ` Juergen Gross
2015-07-28 15:11 ` Juergen Gross
2015-07-28 16:17 ` Dario Faggioli
2015-07-28 17:13 ` Dario Faggioli
2015-07-29 6:04 ` Juergen Gross
2015-07-29 7:09 ` Dario Faggioli
2015-07-29 7:44 ` Dario Faggioli
2015-07-24 16:05 ` Dario Faggioli
2015-07-28 10:05 ` Wei Liu
2015-07-28 15:17 ` Dario Faggioli
2015-07-24 20:27 ` Elena Ufimtseva
2015-07-22 14:50 ` Dario Faggioli
2015-07-22 15:32 ` Boris Ostrovsky
2015-07-22 15:49 ` Dario Faggioli
2015-07-22 18:10 ` Boris Ostrovsky
2015-07-23 7:25 ` Jan Beulich
2015-07-24 16:03 ` Boris Ostrovsky
2015-07-23 13:46 ` Dario Faggioli
2015-07-17 10:17 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02 ` Dario Faggioli
2015-07-27 16:31 ` David Vrabel
2015-07-27 16:33 ` Andrew Cooper
2015-07-27 17:42 ` Dario Faggioli
2015-07-27 17:50 ` Konrad Rzeszutek Wilk
2015-07-27 23:19 ` Andrew Cooper [this message]
2015-07-28 3:52 ` Juergen Gross
2015-07-28 9:40 ` Andrew Cooper
2015-07-28 9:28 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B6BC6D.8020808@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=Wei.Liu2@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.