xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Juergen Gross <jgross@suse.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: PV-vNUMA issue: topology is misinterpreted by the guest
Date: Fri, 24 Jul 2015 17:58:29 +0200	[thread overview]
Message-ID: <1437753509.4682.78.camel@citrix.com> (raw)
In-Reply-To: <55B258C9.4040400@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 3832 bytes --]

On Fri, 2015-07-24 at 17:24 +0200, Juergen Gross wrote:
> On 07/24/2015 05:14 PM, Juergen Gross wrote:
> > On 07/24/2015 04:44 PM, Dario Faggioli wrote:

> >> In fact, I think that it is the topology, i.e., what comes from MSRs,
> >> that needs to adapt, and follow vNUMA, as much as possible. Do we agree
> >> on this?
> >
> > I think we have to be very careful here. I see two possible scenarios:
> >
> > 1) The vcpus are not pinned 1:1 on physical cpus. The hypervisor will
> >     try to schedule the vcpus according to their numa affinity. So they
> >     can change pcpus at any time in case of very busy guests. I don't
> >     think the linux kernel should treat the cpus differently in this
> >     case as it will be in vane regarding the Xen scheduler's activity.
> >     So we should use the "null" topology in this case.
> 
> Sorry, the topology should reflect the vcpu<->numa-node relations, of
> course, but nothing else (so flat topolgy in each numa node).
> 
Yeah, I was replying to this point saying something like this right
now... Luckily, I've seen this email! :-P

With this semantic, I fully agree with this.

> > 2) The vcpus of the guest are all pinned 1:1 to physical cpus. The Xen
> >     scheduler can't move vcpus between pcpus, so the linux kernel should
> >     see the real topology of the used pcpus in order to optimize for this
> >     picture.
> >
>
Mmm... I did think about this too, but I'm not sure. I see the value of
this of course, and the reason why it makes sense. However, pinning can
change on-line, via `xl vcpu-pin' and stuff. Also migration could make
things less certain, I think. What happens if we build on top of the
initial pinning, and then things change?

To be fair, there is stuff building on top of the initial pinning
already, e.g., from which physical NUMA node we allocate the memory
relies depends exactly on that. That being said, I'm not sure I'm
comfortable with adding more of this...

Perhaps introduce an 'immutable_pinning' flag, which will prevent
affinity to be changed, and then bind the topology to pinning only if
that one is set?

> >> Maybe, there is room for "fixing" this at this level, hooking up inside
> >> the scheduler code... but I'm shooting in the dark, without having check
> >> whether and how this could be really feasible, should I?
> >
> > Uuh, I don't think a change of the scheduler on behalf of Xen is really
> > appreciated. :-)
> >
I'm sure it would (have been! :-)) a true and giant nightmare!! :-D

> >> One thing I don't like about this approach is that it would potentially
> >> solve vNUMA and other scheduling anomalies, but...
> >>
> >>> cpuid instruction is available for user mode as well.
> >>>
> >> ...it would not do any good for other subsystems, and user level code
> >> and apps.
> >
> > Indeed. I think the optimal solution would be two-fold: give the
> > scheduler the information it is needing to react correctly via a
> > kernel patch not relying on cpuid values and fiddle with the cpuid
> > values from xen tools according to any needs of other subsystems and/or
> > user code (e.g. licensing).
> 
So, just to check if I'm understanding is correct: you'd like to add an
abstraction layer, in Linux, like in generic (or, perhaps, scheduling)
code, to hide the direct interaction with CPUID.
Such layer, on baremetal, would just read CPUID while, on PV-ops, it'd
check with Xen/match vNUMA/whatever... Is this that you are saying?

If yes, I think I like it...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-07-24 15:59 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16 10:32 PV-vNUMA issue: topology is misinterpreted by the guest Dario Faggioli
2015-07-16 10:47 ` Jan Beulich
2015-07-16 10:56   ` Andrew Cooper
2015-07-16 15:25     ` Wei Liu
2015-07-16 15:45       ` Andrew Cooper
2015-07-16 15:50         ` Boris Ostrovsky
2015-07-16 16:29           ` Jan Beulich
2015-07-16 16:39             ` Andrew Cooper
2015-07-16 16:59               ` Boris Ostrovsky
2015-07-17  6:09                 ` Jan Beulich
2015-07-17  7:27                   ` Dario Faggioli
2015-07-17  7:42                     ` Jan Beulich
2015-07-17  8:44                     ` Wei Liu
2015-07-17 18:17                     ` Boris Ostrovsky
2015-07-20 14:09                       ` Dario Faggioli
2015-07-20 14:43                         ` Boris Ostrovsky
2015-07-21 20:00                           ` Boris Ostrovsky
2015-07-22 13:36                             ` Dario Faggioli
2015-07-22 13:50                               ` Juergen Gross
2015-07-22 13:58                                 ` Boris Ostrovsky
2015-07-22 14:09                                   ` Juergen Gross
2015-07-22 14:44                                     ` Boris Ostrovsky
2015-07-23  4:43                                       ` Juergen Gross
2015-07-23  7:28                                         ` Jan Beulich
2015-07-23  9:42                                         ` Andrew Cooper
2015-07-23 14:07                                         ` Dario Faggioli
2015-07-23 14:13                                           ` Juergen Gross
2015-07-24 10:28                                           ` Juergen Gross
2015-07-24 14:44                                             ` Dario Faggioli
2015-07-24 15:14                                               ` Juergen Gross
2015-07-24 15:24                                                 ` Juergen Gross
2015-07-24 15:58                                                   ` Dario Faggioli [this message]
2015-07-24 16:09                                                     ` Konrad Rzeszutek Wilk
2015-07-24 16:14                                                       ` Dario Faggioli
2015-07-24 16:18                                                       ` Juergen Gross
2015-07-24 16:29                                                         ` Konrad Rzeszutek Wilk
2015-07-24 16:39                                                           ` Juergen Gross
2015-07-24 16:44                                                             ` Boris Ostrovsky
2015-07-27  4:35                                                               ` Juergen Gross
2015-07-27 10:43                                                                 ` George Dunlap
2015-07-27 10:54                                                                   ` Andrew Cooper
2015-07-27 11:13                                                                     ` Juergen Gross
2015-07-27 10:54                                                                   ` Juergen Gross
2015-07-27 11:11                                                                     ` George Dunlap
2015-07-27 12:01                                                                       ` Juergen Gross
2015-07-27 12:16                                                                         ` Tim Deegan
2015-07-27 13:23                                                                         ` Dario Faggioli
2015-07-27 14:02                                                                           ` Juergen Gross
2015-07-27 14:02                                                                       ` Dario Faggioli
2015-07-27 10:41                                                       ` George Dunlap
2015-07-27 10:49                                                         ` Andrew Cooper
2015-07-27 13:11                                                           ` Dario Faggioli
2015-07-24 16:10                                                     ` Juergen Gross
2015-07-24 16:40                                                       ` Boris Ostrovsky
2015-07-24 16:48                                                         ` Juergen Gross
2015-07-24 17:11                                                           ` Boris Ostrovsky
2015-07-27 13:40                                                             ` Dario Faggioli
2015-07-27  4:24                                                         ` Juergen Gross
2015-07-27 14:09                                                       ` Dario Faggioli
2015-07-27 14:34                                                         ` Boris Ostrovsky
2015-07-27 14:43                                                           ` Juergen Gross
2015-07-27 14:51                                                             ` Boris Ostrovsky
2015-07-27 15:03                                                               ` Juergen Gross
2015-07-27 14:47                                                           ` Juergen Gross
2015-07-27 14:58                                                           ` Dario Faggioli
2015-07-28  4:29                                                         ` Juergen Gross
2015-07-28 15:11                                                           ` Juergen Gross
2015-07-28 16:17                                                             ` Dario Faggioli
2015-07-28 17:13                                                               ` Dario Faggioli
2015-07-29  6:04                                                               ` Juergen Gross
2015-07-29  7:09                                                                 ` Dario Faggioli
2015-07-29  7:44                                                             ` Dario Faggioli
2015-07-24 16:05                                                 ` Dario Faggioli
2015-07-28 10:05                                                   ` Wei Liu
2015-07-28 15:17                                                     ` Dario Faggioli
2015-07-24 20:27                                               ` Elena Ufimtseva
2015-07-22 14:50                                     ` Dario Faggioli
2015-07-22 15:32                                       ` Boris Ostrovsky
2015-07-22 15:49                                         ` Dario Faggioli
2015-07-22 18:10                                           ` Boris Ostrovsky
2015-07-23  7:25                                             ` Jan Beulich
2015-07-24 16:03                                               ` Boris Ostrovsky
2015-07-23 13:46                                             ` Dario Faggioli
2015-07-17 10:17                 ` Andrew Cooper
2015-07-16 15:26 ` Wei Liu
2015-07-27 15:13 ` David Vrabel
2015-07-27 16:02   ` Dario Faggioli
2015-07-27 16:31     ` David Vrabel
2015-07-27 16:33       ` Andrew Cooper
2015-07-27 17:42         ` Dario Faggioli
2015-07-27 17:50           ` Konrad Rzeszutek Wilk
2015-07-27 23:19           ` Andrew Cooper
2015-07-28  3:52             ` Juergen Gross
2015-07-28  9:40               ` Andrew Cooper
2015-07-28  9:28             ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1437753509.4682.78.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jgross@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).