xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Dario Faggioli <dario.faggioli@citrix.com>
Cc: keir@xen.org, Ian.Campbell@citrix.com, lccycc123@gmail.com,
	george.dunlap@eu.citrix.com, msw@linux.com,
	stefano.stabellini@eu.citrix.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, JBeulich@suse.com,
	Wei Liu <wei.liu2@citrix.com>,
	Elena Ufimtseva <ufimtseva@gmail.com>
Subject: Re: Is: cpuid creation of PV guests is not correct.
Date: Tue, 22 Jul 2014 23:34:38 +0100	[thread overview]
Message-ID: <53CEE6FE.4080101@citrix.com> (raw)
In-Reply-To: <20140722194301.GE2940@laptop.dumpdata.com>

On 22/07/2014 20:43, Konrad Rzeszutek Wilk wrote:
>> I.e., no matter how I pin the vcpus, the guest sees the 4 vcpus as if
>> they were all SMT siblings, within the same core, sharing all cache
>> levels.
> My recollection was that the setting of these CPUID values is
> tied in how the toolstack sees it - and since the toolstack
> runs in the initial domain - that is where it picks this data up.
>
> This problem had been discussed by Andrew Cooper at some point
> (Hackathon? Emails? IRC?) and moved under the 'fix cpuid creation/parsing'.
>
> I think that this issue should not affect Elena's patchset - 
> as the vNUMA is an innocent bystander that gets affected by this.
>
> As such changing the title.

There are a whole set of related issues with regard to cpuid under Xen
currently.  I investigated the problems from the point of view of
heterogeneous host feature levelling.  I do plan to work on these issues
(as feature levelling is an important usecase for XenServer) and will do
so when the migration v2 work is complete.

However, to summarise the issues:

Xen's notion of a domains cpuid policy was adequate for single-vcpu VMs,
but was never updated when multi-vcpu VMs were introduced.  There is no
concept of per-vcpu information in the policy, which is why all the
cache IDs you read are identical.

The policy is theoretically controlled exclusively from the toolstack. 
The toolstack has the responsibility of setting the contents of any
leaves it believes the guest might be interested in, and Xen stores
these values wholesale.  If a cpuid query is requested of a domain which
lacks an entry for that specific leaf, the information is retrieved by
running a cpuid instruction, which is not necessarily deterministic.

The toolstack, under the cpuid policy of the domain it is running in,
attempts to guess the featureset to be offered to a domain, with
possible influence from user-specified domain configuration.  Xen
doesn't validate the featureset when the policy is set.  Instead, there
is veto/sanity code used on all accesses to the policy.  As a result,
the cpuid values as seen by the guest are not necessarily the same as
the values successfully set by the toolstack.

The various IDs which are obtained from cpuid inside a domain will
happen to the the IDs available to libxc when it was building the policy
for the domain.  For a regular PV dom0, it will be the IDs available on
the pcpu (or several, given rescheduling) on which libxc was running.

Xen can completely control the values returned by the cpuid instruction
from HVM/PVM domains.  On the other hand, results for PV guests are
strictly opt-in via use of the Xen forced-emulation prefix.  As a
result, well behaved PV kernels will see the policy, but regular
userspace applications in PV guests will see the native cpuid results.

There are two caveats.  Intel Ivy-bridge (and later) hardware have
support for cpuid faulting which allows Xen to regain exactly the same
level of control over PV guests as it has for HVM guests.  There are
also cpuid masking (Intel)/override (AMD) MSRs (which vary in
availability between processor generations) which allow the visible
featureset of any cpuid instruction to be altered.


I have some vague plans for how to fix these issues, which I will need
to see about designing sensibly in due course.  However, a brief
overview is something like this:

* Ownership of the entire domain policy resides with Xen rather than the
toolstack, and when a domain is created, it shall inherit from the host
setup, given appropriate per-domain type restrictions.
* The toolstack may query and modify a domains policy, with verification
of the modifications before before they are accepted.

~Andrew

  reply	other threads:[~2014-07-22 22:34 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18  5:49 [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 01/10] xen: vnuma topology and subop hypercalls Elena Ufimtseva
2014-07-18 10:30   ` Wei Liu
2014-07-20 13:16     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-22 15:18         ` Dario Faggioli
2014-07-23  5:33           ` Elena Ufimtseva
2014-07-18 13:49   ` Konrad Rzeszutek Wilk
2014-07-20 13:26     ` Elena Ufimtseva
2014-07-22 15:14   ` Dario Faggioli
2014-07-23  5:22     ` Elena Ufimtseva
2014-07-23 14:06   ` Jan Beulich
2014-07-25  4:52     ` Elena Ufimtseva
2014-07-25  7:33       ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 02/10] xsm bits for vNUMA hypercalls Elena Ufimtseva
2014-07-18 13:50   ` Konrad Rzeszutek Wilk
2014-07-18 15:26     ` Daniel De Graaf
2014-07-20 13:48       ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 03/10] vnuma hook to debug-keys u Elena Ufimtseva
2014-07-23 14:10   ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 04/10] libxc: Introduce xc_domain_setvnuma to set vNUMA Elena Ufimtseva
2014-07-18 10:33   ` Wei Liu
2014-07-29 10:33   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 05/10] libxl: vnuma topology configuration parser and doc Elena Ufimtseva
2014-07-18 10:53   ` Wei Liu
2014-07-20 14:04     ` Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-29 10:42   ` Ian Campbell
2014-08-06  4:46     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 06/10] libxc: move code to arch_boot_alloc func Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 07/10] libxc: allocate domain memory for vnuma enabled Elena Ufimtseva
2014-07-29 10:43   ` Ian Campbell
2014-08-06  4:48     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 08/10] libxl: build numa nodes memory blocks Elena Ufimtseva
2014-07-18 11:01   ` Wei Liu
2014-07-20 12:58     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-18  5:50 ` [PATCH v6 09/10] libxl: vnuma nodes placement bits Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 10/10] libxl: set vnuma for domain Elena Ufimtseva
2014-07-18 10:58   ` Wei Liu
2014-07-29 10:45   ` Ian Campbell
2014-08-12  3:52     ` Elena Ufimtseva
2014-08-12  9:42       ` Wei Liu
2014-08-12 17:10         ` Dario Faggioli
2014-08-12 17:13           ` Wei Liu
2014-08-12 17:24             ` Elena Ufimtseva
2014-07-18  6:16 ` [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  9:53 ` Wei Liu
2014-07-18 10:13   ` Dario Faggioli
2014-07-18 11:48     ` Wei Liu
2014-07-20 14:57       ` Elena Ufimtseva
2014-07-22 15:49         ` Dario Faggioli
2014-07-22 14:03       ` Dario Faggioli
2014-07-22 14:48         ` Wei Liu
2014-07-22 15:06           ` Dario Faggioli
2014-07-22 16:47             ` Wei Liu
2014-07-22 19:43         ` Is: cpuid creation of PV guests is not correct. Was:Re: " Konrad Rzeszutek Wilk
2014-07-22 22:34           ` Andrew Cooper [this message]
2014-07-22 22:53           ` Dario Faggioli
2014-07-23  6:00             ` Elena Ufimtseva
2014-07-22 12:49 ` Dario Faggioli
2014-07-23  5:59   ` Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53CEE6FE.4080101@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=lccycc123@gmail.com \
    --cc=msw@linux.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=ufimtseva@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).