From: Dario Faggioli <dario.faggioli@citrix.com>
To: Juergen Gross <jgross@suse.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
Wei Liu <wei.liu2@citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Wed, 29 Jul 2015 09:44:55 +0200 [thread overview]
Message-ID: <1438155895.16912.10.camel@citrix.com> (raw)
In-Reply-To: <55B79B9C.6030505@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 4528 bytes --]
On Tue, 2015-07-28 at 17:11 +0200, Juergen Gross wrote:
> On 07/28/2015 06:29 AM, Juergen Gross wrote:
> > On 07/27/2015 04:09 PM, Dario Faggioli wrote:
> >> On Fri, 2015-07-24 at 18:10 +0200, Juergen Gross wrote:
> >>> On 07/24/2015 05:58 PM, Dario Faggioli wrote:
> >>
> >>>> So, just to check if I'm understanding is correct: you'd like to add an
> >>>> abstraction layer, in Linux, like in generic (or, perhaps, scheduling)
> >>>> code, to hide the direct interaction with CPUID.
> >>>> Such layer, on baremetal, would just read CPUID while, on PV-ops, it'd
> >>>> check with Xen/match vNUMA/whatever... Is this that you are saying?
> >>>
> >>> Sort of, yes.
> >>>
> >>> I just wouldn't add it, as it is already existing (more or less). It
> >>> can deal right now with AMD and Intel, we would "just" have to add Xen.
> >>>
> >> So, having gone through the rest of the thread (so far), and having
> >> given a fair amount o thinking to this, I really think that something
> >> like this would be a good thing to have in Linux.
> >>
> >> Of course, it's not that my opinion on where should be in Linux counts
> >> that much! :-D Nevertheless, I wanted to make it clear that, while
> >> skeptic at the beginning, I now think this is (part of) the way to go,
> >> as I said and explained in my reply to George.
> >
> > I think it's time to obtain some real numbers.
> >
> > I'll make some performance tests on a big machine (4 sockets, 60 cores,
> > 120 threads) regarding topology information:
> >
> > - bare metal
> > - "random" topology (like today)
> > - "simple" topology (all vcpus regarded as equal)
> > - "real" topology with all vcpus pinned
> >
> > This should show:
> >
> > - how intrusive would the topology patch(es) be?
> > - what is the performance impact of a "wrong" scheduling data base
>
> On the above box I used a pvops kernel 4.2-rc4 plus a rather small patch
> (see attachment). I did 5 kernel builds in each environment:
>
> make clean
> time make -j 120
>
> The first result of the 5 runs was always omitted as it would have to
> build up buffer caches etc. The Xen cases were all done in dom0, pinning
> of vcpus in the last scenario was done via dom0_vcpus_pin boot parameter
> of the hypervisor.
>
> Here are the results (everything in seconds):
>
> elapsed user system
> bare metal: 100 5770 805
> "random" topology: 283 6740 20700
> "simple" topology: 290 6740 22200
> "real" topology: 185 7800 8040
>
> As expected bare metal is the best. Next is "real" topology with pinned
> vcpus (expected again - but system time already factor of 10 up!).
> What I didn't expect is: "random" is better than "simple" topology. I
> could test some other topologies (e.g. everything on one socket, or even
> on one core), but I'm not sure this makes sense. I didn't check the
> exact topology result of the "random" case, maybe I'll do that tomorrow
> with another measurement.
>
> BTW: the topology hack is working, as each cpu is shown to have a
> sibling count of 1 in /proc/cpuinfo.
>
Hey, just a 'wild' idea, as a possible explanation (or direction for
further investigations) on why 'simple' si actually worse than 'random'.
Could it be that, by setting up the topology like in 'simple', we create
more work for the Linux scheduler, and hence incur in more overhead? Or,
looking at thing the other way round, which is it the topology that
impose the less overhead, in terms of load balancing work, to the Linux
scheduler?
I think we should identify and try it as, in the absence of any vNUMA
topology being specified, that's what we want, as our flat topology (and
with vNUMA, that's what we want within virtual nodes).
I don't really know, right now, whether such topology would be 'all
cores, no SMT siblings', or 'all siblings of each others', or 'all
sockets, no core/SMT siblings'... This requires inspecting the
scheduler's and scheduling domain's code, which I can do, but not today,
as I won't be working. If you (or anyone) think it's worth and actually
have a look, let me/us know your findings. If not, I'll do it myself
tomorrow.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-07-29 7:45 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-16 10:32 PV-vNUMA issue: topology is misinterpreted by the guest Dario Faggioli
2015-07-16 10:47 ` Jan Beulich
2015-07-16 10:56 ` Andrew Cooper
2015-07-16 15:25 ` Wei Liu
2015-07-16 15:45 ` Andrew Cooper
2015-07-16 15:50 ` Boris Ostrovsky
2015-07-16 16:29 ` Jan Beulich
2015-07-16 16:39 ` Andrew Cooper
2015-07-16 16:59 ` Boris Ostrovsky
2015-07-17 6:09 ` Jan Beulich
2015-07-17 7:27 ` Dario Faggioli
2015-07-17 7:42 ` Jan Beulich
2015-07-17 8:44 ` Wei Liu
2015-07-17 18:17 ` Boris Ostrovsky
2015-07-20 14:09 ` Dario Faggioli
2015-07-20 14:43 ` Boris Ostrovsky
2015-07-21 20:00 ` Boris Ostrovsky
2015-07-22 13:36 ` Dario Faggioli
2015-07-22 13:50 ` Juergen Gross
2015-07-22 13:58 ` Boris Ostrovsky
2015-07-22 14:09 ` Juergen Gross
2015-07-22 14:44 ` Boris Ostrovsky
2015-07-23 4:43 ` Juergen Gross
2015-07-23 7:28 ` Jan Beulich
2015-07-23 9:42 ` Andrew Cooper
2015-07-23 14:07 ` Dario Faggioli
2015-07-23 14:13 ` Juergen Gross
2015-07-24 10:28 ` Juergen Gross
2015-07-24 14:44 ` Dario Faggioli
2015-07-24 15:14 ` Juergen Gross
2015-07-24 15:24 ` Juergen Gross
2015-07-24 15:58 ` Dario Faggioli
2015-07-24 16:09 ` Konrad Rzeszutek Wilk
2015-07-24 16:14 ` Dario Faggioli
2015-07-24 16:18 ` Juergen Gross
2015-07-24 16:29 ` Konrad Rzeszutek Wilk
2015-07-24 16:39 ` Juergen Gross
2015-07-24 16:44 ` Boris Ostrovsky
2015-07-27 4:35 ` Juergen Gross
2015-07-27 10:43 ` George Dunlap
2015-07-27 10:54 ` Andrew Cooper
2015-07-27 11:13 ` Juergen Gross
2015-07-27 10:54 ` Juergen Gross
2015-07-27 11:11 ` George Dunlap
2015-07-27 12:01 ` Juergen Gross
2015-07-27 12:16 ` Tim Deegan
2015-07-27 13:23 ` Dario Faggioli
2015-07-27 14:02 ` Juergen Gross
2015-07-27 14:02 ` Dario Faggioli
2015-07-27 10:41 ` George Dunlap
2015-07-27 10:49 ` Andrew Cooper
2015-07-27 13:11 ` Dario Faggioli
2015-07-24 16:10 ` Juergen Gross
2015-07-24 16:40 ` Boris Ostrovsky
2015-07-24 16:48 ` Juergen Gross
2015-07-24 17:11 ` Boris Ostrovsky
2015-07-27 13:40 ` Dario Faggioli
2015-07-27 4:24 ` Juergen Gross
2015-07-27 14:09 ` Dario Faggioli
2015-07-27 14:34 ` Boris Ostrovsky
2015-07-27 14:43 ` Juergen Gross
2015-07-27 14:51 ` Boris Ostrovsky
2015-07-27 15:03 ` Juergen Gross
2015-07-27 14:47 ` Juergen Gross
2015-07-27 14:58 ` Dario Faggioli
2015-07-28 4:29 ` Juergen Gross
2015-07-28 15:11 ` Juergen Gross
2015-07-28 16:17 ` Dario Faggioli
2015-07-28 17:13 ` Dario Faggioli
2015-07-29 6:04 ` Juergen Gross
2015-07-29 7:09 ` Dario Faggioli
2015-07-29 7:44 ` Dario Faggioli [this message]
2015-07-24 16:05 ` Dario Faggioli
2015-07-28 10:05 ` Wei Liu
2015-07-28 15:17 ` Dario Faggioli
2015-07-24 20:27 ` Elena Ufimtseva
2015-07-22 14:50 ` Dario Faggioli
2015-07-22 15:32 ` Boris Ostrovsky
2015-07-22 15:49 ` Dario Faggioli
2015-07-22 18:10 ` Boris Ostrovsky
2015-07-23 7:25 ` Jan Beulich
2015-07-24 16:03 ` Boris Ostrovsky
2015-07-23 13:46 ` Dario Faggioli
2015-07-17 10:17 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02 ` Dario Faggioli
2015-07-27 16:31 ` David Vrabel
2015-07-27 16:33 ` Andrew Cooper
2015-07-27 17:42 ` Dario Faggioli
2015-07-27 17:50 ` Konrad Rzeszutek Wilk
2015-07-27 23:19 ` Andrew Cooper
2015-07-28 3:52 ` Juergen Gross
2015-07-28 9:40 ` Andrew Cooper
2015-07-28 9:28 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1438155895.16912.10.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jgross@suse.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).