From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
Juergen Gross <jgross@suse.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
Wei Liu <wei.liu2@citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Wed, 22 Jul 2015 11:32:03 -0400 [thread overview]
Message-ID: <55AFB773.7010906@oracle.com> (raw)
In-Reply-To: <1437576645.5036.56.camel@citrix.com>
On 07/22/2015 10:50 AM, Dario Faggioli wrote:
> On Wed, 2015-07-22 at 16:09 +0200, Juergen Gross wrote:
>> On 07/22/2015 03:58 PM, Boris Ostrovsky wrote:
>>> What if I configure a guest to follow HW topology? I.e. I pin VCPUs to
>>> appropriate cores/threads? With elfnote I am stuck with disabled topology.
>> Add an option to do exactly that: follow HW topology (pin vcpus,
>> configure vnuma)?
>>
> I thought about configuring things in such a way that they match the
> host topology, as Boris is suggesting, too. And in that case, I think
> arranging for doing so in toolstack, if PV vNUMA is identified (as I
> think Juergen is suggesting) seems a good approach.
>
> However, when I try to do that on my box, manually, but I don't seem to
> be able to.
>
> Here's what I tried. Since I have this host topology:
> cpu_topology :
> cpu: core socket node
> 0: 0 1 0
> 1: 0 1 0
> 2: 1 1 0
> 3: 1 1 0
> 4: 9 1 0
> 5: 9 1 0
> 6: 10 1 0
> 7: 10 1 0
> 8: 0 0 1
> 9: 0 0 1
> 10: 1 0 1
> 11: 1 0 1
> 12: 9 0 1
> 13: 9 0 1
> 14: 10 0 1
> 15: 10 0 1
>
> I configured the guest like this:
> vcpus = '4'
> memory = '1024'
> vnuma = [ [ "pnode=0","size=512","vcpus=0-1","vdistances=10,20" ],
> [ "pnode=1","size=512","vcpus=2-3","vdistances=20,10" ] ]
> cpus=["0","1","8","9"]
>
> This means vcpus 0 and 1, which are assigned to vnode 0, are pinned to
> pcpu 0 and 1, which are siblings, per the host topology.
> Similarly, vcpus 2 and 3, assigned to vnode 1, are assigned to two
> siblings pcpus on pnode 1.
>
> This seems to be honoured:
> # xl vcpu-list 4
> Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
> test 4 0 0 -b- 10.9 0 / 0-7
> test 4 1 1 -b- 7.6 1 / 0-7
> test 4 2 8 -b- 0.1 8 / 8-15
> test 4 3 9 -b- 0.1 9 / 8-15
>
> And yet, no joy:
> # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
> # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
> # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
> # ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
> # xl vcpu-list 4
> Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
> test 4 0 0 r-- 16.4 0 / 0-7
> test 4 1 1 r-- 12.5 1 / 0-7
> test 4 2 8 -b- 0.2 8 / 8-15
> test 4 3 9 -b- 0.1 9 / 8-15
>
> So, what am I doing wrong at "following the hw topology"?
>
>>> Besides, this is not necessarily a NUMA-only issue, it's a scheduling
>>> one (inside the guest) as well.
>> Sure. That's what Jan said regarding SUSE's xen-kernel. No toplogy info
>> (or a trivial one) might be better than the wrong one...
>>
> Yep. Exacty. As Boris says, this is a generic scheduling issue, although
> it's tru that it's only (as far as I can tell) with vNUMA that it bite
> us so hard...
I am not sure that it's only vNUMA. It's just that with vNUMA we can see
a warning (on your system) that something goes wrong. In other cases
(like scheduling, or sizing objects based on discovered cache sizes) we
don't see anything in the log but system/programs are making wrong
decisions. (And your results above may well be the example of that)
-boris
> I mean, performance are always going to be inconsistent,
> but it's only in that case that you basically _loose_ some of the
> vcpus! :-O
>
> Dario
next prev parent reply other threads:[~2015-07-22 15:33 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-16 10:32 PV-vNUMA issue: topology is misinterpreted by the guest Dario Faggioli
2015-07-16 10:47 ` Jan Beulich
2015-07-16 10:56 ` Andrew Cooper
2015-07-16 15:25 ` Wei Liu
2015-07-16 15:45 ` Andrew Cooper
2015-07-16 15:50 ` Boris Ostrovsky
2015-07-16 16:29 ` Jan Beulich
2015-07-16 16:39 ` Andrew Cooper
2015-07-16 16:59 ` Boris Ostrovsky
2015-07-17 6:09 ` Jan Beulich
2015-07-17 7:27 ` Dario Faggioli
2015-07-17 7:42 ` Jan Beulich
2015-07-17 8:44 ` Wei Liu
2015-07-17 18:17 ` Boris Ostrovsky
2015-07-20 14:09 ` Dario Faggioli
2015-07-20 14:43 ` Boris Ostrovsky
2015-07-21 20:00 ` Boris Ostrovsky
2015-07-22 13:36 ` Dario Faggioli
2015-07-22 13:50 ` Juergen Gross
2015-07-22 13:58 ` Boris Ostrovsky
2015-07-22 14:09 ` Juergen Gross
2015-07-22 14:44 ` Boris Ostrovsky
2015-07-23 4:43 ` Juergen Gross
2015-07-23 7:28 ` Jan Beulich
2015-07-23 9:42 ` Andrew Cooper
2015-07-23 14:07 ` Dario Faggioli
2015-07-23 14:13 ` Juergen Gross
2015-07-24 10:28 ` Juergen Gross
2015-07-24 14:44 ` Dario Faggioli
2015-07-24 15:14 ` Juergen Gross
2015-07-24 15:24 ` Juergen Gross
2015-07-24 15:58 ` Dario Faggioli
2015-07-24 16:09 ` Konrad Rzeszutek Wilk
2015-07-24 16:14 ` Dario Faggioli
2015-07-24 16:18 ` Juergen Gross
2015-07-24 16:29 ` Konrad Rzeszutek Wilk
2015-07-24 16:39 ` Juergen Gross
2015-07-24 16:44 ` Boris Ostrovsky
2015-07-27 4:35 ` Juergen Gross
2015-07-27 10:43 ` George Dunlap
2015-07-27 10:54 ` Andrew Cooper
2015-07-27 11:13 ` Juergen Gross
2015-07-27 10:54 ` Juergen Gross
2015-07-27 11:11 ` George Dunlap
2015-07-27 12:01 ` Juergen Gross
2015-07-27 12:16 ` Tim Deegan
2015-07-27 13:23 ` Dario Faggioli
2015-07-27 14:02 ` Juergen Gross
2015-07-27 14:02 ` Dario Faggioli
2015-07-27 10:41 ` George Dunlap
2015-07-27 10:49 ` Andrew Cooper
2015-07-27 13:11 ` Dario Faggioli
2015-07-24 16:10 ` Juergen Gross
2015-07-24 16:40 ` Boris Ostrovsky
2015-07-24 16:48 ` Juergen Gross
2015-07-24 17:11 ` Boris Ostrovsky
2015-07-27 13:40 ` Dario Faggioli
2015-07-27 4:24 ` Juergen Gross
2015-07-27 14:09 ` Dario Faggioli
2015-07-27 14:34 ` Boris Ostrovsky
2015-07-27 14:43 ` Juergen Gross
2015-07-27 14:51 ` Boris Ostrovsky
2015-07-27 15:03 ` Juergen Gross
2015-07-27 14:47 ` Juergen Gross
2015-07-27 14:58 ` Dario Faggioli
2015-07-28 4:29 ` Juergen Gross
2015-07-28 15:11 ` Juergen Gross
2015-07-28 16:17 ` Dario Faggioli
2015-07-28 17:13 ` Dario Faggioli
2015-07-29 6:04 ` Juergen Gross
2015-07-29 7:09 ` Dario Faggioli
2015-07-29 7:44 ` Dario Faggioli
2015-07-24 16:05 ` Dario Faggioli
2015-07-28 10:05 ` Wei Liu
2015-07-28 15:17 ` Dario Faggioli
2015-07-24 20:27 ` Elena Ufimtseva
2015-07-22 14:50 ` Dario Faggioli
2015-07-22 15:32 ` Boris Ostrovsky [this message]
2015-07-22 15:49 ` Dario Faggioli
2015-07-22 18:10 ` Boris Ostrovsky
2015-07-23 7:25 ` Jan Beulich
2015-07-24 16:03 ` Boris Ostrovsky
2015-07-23 13:46 ` Dario Faggioli
2015-07-17 10:17 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02 ` Dario Faggioli
2015-07-27 16:31 ` David Vrabel
2015-07-27 16:33 ` Andrew Cooper
2015-07-27 17:42 ` Dario Faggioli
2015-07-27 17:50 ` Konrad Rzeszutek Wilk
2015-07-27 23:19 ` Andrew Cooper
2015-07-28 3:52 ` Juergen Gross
2015-07-28 9:40 ` Andrew Cooper
2015-07-28 9:28 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55AFB773.7010906@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jgross@suse.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.