From: Joao Martins <joao.m.martins@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Tim Deegan <tim@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API
Date: Mon, 22 Feb 2016 18:50:19 +0000 [thread overview]
Message-ID: <56CB586B.1080702@oracle.com> (raw)
In-Reply-To: <56C8BBCF.4030708@citrix.com>
On 02/20/2016 07:17 PM, Andrew Cooper wrote:
> On 20/02/16 17:39, Joao Martins wrote:
>>
>>>>>> and given that this
>>>>>> is exposed on both sysctl and libxl (through libxl_hwcap) shouldn't its size
>>>>>> match the real one (boot_cpu_data.x86_capability) i.e. NCAPINTS ? Additionally I
>>>>>> see that libxl_hwcap is also hardcoded to 8 alongside struct xen_sysctl_physinfo
>>>>>> when it should be 10 ?
>>>>> Hardcoding of the size in sysctl can be worked around. Fixing libxl is
>>>>> harder.
>>>>>
>>>>> The synthetic leaves are internal and should not be exposed.
>>>>>
>>>>>> libxl users could potentially make use of this hwcap field to see what features
>>>>>> the host CPU supports.
>>>>> The purpose of the new featureset interface is to have stable object
>>>>> which can be used by higher level toolstacks.
>>>>>
>>>>> This is done by pretending that hw_caps never existed, and replacing it
>>>>> wholesale with a bitmap, (specified as variable length and safe to
>>>>> zero-extend), with an ABI in the public header files detailing what each
>>>>> bit means.
>>>> Given that you introduce a new API for libxc (xc_get_cpu_featureset()) perhaps
>>>> an equivalent to libxl could also be added? That wat users of libxl could also
>>>> query about the host and guests supported features. I would be happy to produce
>>>> patches towards that.
>>> In principle, this is fine. Part of this is covered by the xen-cpuid
>>> utility in a later patch.
>>>
>> OK.
>>
>>> Despite my plans to further rework guest cpuid handling, the principle
>>> of the {raw,host,pv,hvm}_featuresets is expected to stay, and be usable
>>> in their current form.
>> That's great to hear. The reason I brought this up is because libvirt has the
>> idea of cpu model and features associated with it (similar to qemu -cpu
>> XXX,+feature,-feature stuff but in an hypervisor agnostic manner that other
>> architectures can also use). libvirt could do mostly everything on its own, but
>> it still needs to know what the host supports. Based on that it then calculates
>> the lowest common denominator of cpu features to be enabled or masked out for
>> guests when comparing to an older family in a pool of servers. Though PV/HVM
>> (with{,out} hap/shadow) have different feature sets as you mention. So libvirt
>> might be thrown into error since a certain feature isn't sure to be set/masked
>> for a certain type of guest. So knowing those (i.e {pv,hvm,...}_featuresets in
>> advance lets libxl users make more reliable usage of the libxl cpuid policies to
>> more correctly normalize the cpuid for each type of guest.
>
> Does libvirt currently use hw_caps (and my series will inadvertently
> break it), or are you looking to do some new work for future benefit?
Yeah, but only one bit i.e. PAE on word 0 (which is the only word that was kept
on the same place on your series). Yeah I am looking at this for future work and
trying to understand what's missing there. I do have a patch for libvirt to
parse your hw_caps but given it's not a stable format, so it might not make
sense anymore to upstream it.
>
> Sadly, cpuid levelling is a quagmire and not as simple as just choosing
> the common subset of bits. When I started this project I was expecting
> it to be bad, but nothing like as bad as it has turned out to be.
>
Indeed, Perhaps I overstated a bit before, when saying "libvirt could do mostly
everything on its own". It certainly doesn't deal with these issues you mention
below. I guess this would hypervisor part of it (qemu/xen/vmware module on
libvirt). I further extend a bit below on what libvirt deals with.
> As an example, the "deprecates fcs/fds" bit which is the subject of the
> "inverted" mask. The meaning of the bit is "hardware no longer supports
> x87 fcs/fds, and they are hardwired to zero".
>
> Originally, the point of the inverted mask was to make a "featureset"
> which could be levelled sensibly without specific knowledge of the
> meaning of each bit. This property is important for forwards
> compatibility, and avoiding unnecessary complexity in higher level
> toolstack components.
>
> However, with hindsight, attempting to level this bit is pointless. It
> is a statement about a change in pre-existing behaviour of an element of
> the cpu pipeline, and the pipeline behaviour will not change depending
> on how the bit is advertised to the guest. Another bit, "fdp exception
> only" is in a similar bucket.
>
> Other issues, which I haven't even tried to tackle in this series, are
> items such as the MXCSR mask. The real value cannot be levelled, is
> expected to remain constant after boot, and liable to induce #GP faults
> on fxrstor if it changes. Alternatively, there is EFER.LMSLE (long mode
> segment limit enable) which doesn't even have a feature bit to indicate
> availability (not that I can plausibly see an OS actually turning that
> feature on).
Woah, I wasn't aware of these issues levelling issues.
>
> A toolstack needs to handles all of:
> * The maximum "configuration" available to a guest on the available servers.
> * Which bits of that can be controlled, and which will simply leak through.
> * What the guest actually saw when it booted.
>
> (I use configuration here to include items such as max leaf, max phys
> addr, etc which are important to be levelled, but not included in the
> plain feature bits in cpuid).
>
> My longterm plans involve:
> * Having Xen construct a full "maximum" cpuid policy, rather than just a
> featureset.
> * Per-domain cpuid policy, seeded from maximum on domain_create, and
> modified where appropriate (e.g. hap vs shadow, PV guest switching
> between native and compat mode).
> * All validity checking for updates in the set_cpuid hypercall rather
> than being deferred to the cpuid intercept point.
> * A get_cpuid hypercall so a toolstack can actually retrieve the policy
> a guest will see.
>
> Even further work involves:
> * Put all this information into the migration stream, rather than having
> it regenerated by the destination toolstack.
> * MSR levelling.
>
> But that is a huge quantity more work, which is why this series focuses
> just on the featureset alone, in the hope that the featureset it still a
> useful discrete item outside the context of a full cpuid policy.
>
> I guess my question at the end of all this is what libvirt currently
> handles of all of this?
Hm, libvirt is a high level toolstack (meaning higher than libxl) and doesn't
deal with these things at this detail, at least AFAICT. It has the idea of cpu
and feature of each which is an idea originally borrowed from qemu as a way of
feature representation of each type of host. Each supported hypervisor in
libvirt will deal it's own way.
It has a cpu map per architecture[0] (x86/ppc only) to describe for example how
does it look like each family of CPUs (Penryn, Broadwell, Opteron, etc). It
describes too how can the features be checked: on x86, these features are
described with CPUID leaf, subleaf and registers output as you might imagine.
Note that these can be changed the way the admin says, and even define custom
ones and exclude features from them too. With these defined you include the
common features and model to create the *guest* CPU definition. Upon
bootstrapping the hypervisor driver, it looks for the most similar model and
append any unmatched features in addition to the host cpu model. This is the
same algorithm when comparing a newer family to an older one in a pool of
servers i.e. comparing cpu definitions.
[This could be viewed the same as the items you included above:
* The maximum "configuration" available to a guest on the available servers.
* Which bits of that can be controlled, and which will simply leak through.]
Though it wouldn't deal with the configuration as you say, but just with the
features, deferring the rest to underlying hypervisor libraries in use?]
In addition there are also policies attached to each features: for features
there is "force", "require", "disable", "optional", "forbid". Also there are
policies to describe how you want to match the cpu model you are describing such
as *minimum* amount of features, *exact* match of features described. When
booting the guest it then checks whether all the features are actually there and
if it's all according to feature policies.
[This could be viewed the same as the items you included above:
* What the guest actually saw when it booted.]
[0]
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/cpu/cpu_map.xml;h=0b6d424db4bdaef7925a2faf7b881f104b1ef4e5;hb=HEAD
> We certainly can wire the featureset
> information through libxl, but it is insufficient in the general case
> for making migration safe.
Right, with the info and plans you just described I guess it some of things
aren't there yet and it would be a lot of "guesswork".
Thanks!
Joao
next prev parent reply other threads:[~2016-02-22 18:50 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-05 13:41 [PATCH RFC v2 00/30] x86: Improvements to cpuid handling for guests Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 01/30] xen/x86: Drop X86_FEATURE_3DNOW_ALT Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 02/30] xen/x86: Do not store VIA/Cyrix/Centaur CPU features Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 03/30] xen/x86: Drop cpuinfo_x86.x86_power Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 04/30] xen/x86: Improvements to pv_cpuid() Andrew Cooper
2016-02-05 13:41 ` [PATCH v2 05/30] xen/public: Export cpu featureset information in the public API Andrew Cooper
2016-02-12 16:27 ` Jan Beulich
2016-02-17 13:08 ` Andrew Cooper
2016-02-17 13:34 ` Jan Beulich
2016-02-19 17:29 ` Joao Martins
2016-02-19 17:55 ` Andrew Cooper
2016-02-19 22:03 ` Joao Martins
2016-02-20 16:17 ` Andrew Cooper
2016-02-20 17:39 ` Joao Martins
2016-02-20 19:17 ` Andrew Cooper
2016-02-22 18:50 ` Joao Martins [this message]
2016-02-05 13:41 ` [PATCH v2 06/30] xen/x86: Script to automatically process featureset information Andrew Cooper
2016-02-12 16:36 ` Jan Beulich
2016-02-12 16:43 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 07/30] xen/x86: Collect more cpuid feature leaves Andrew Cooper
2016-02-12 16:38 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 08/30] xen/x86: Mask out unknown features from Xen's capabilities Andrew Cooper
2016-02-12 16:43 ` Jan Beulich
2016-02-12 16:48 ` Andrew Cooper
2016-02-12 17:14 ` Jan Beulich
2016-02-17 13:12 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 09/30] xen/x86: Store antifeatures inverted in a featureset Andrew Cooper
2016-02-12 16:47 ` Jan Beulich
2016-02-12 16:50 ` Andrew Cooper
2016-02-12 17:15 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 10/30] xen/x86: Annotate VM applicability in featureset Andrew Cooper
2016-02-12 17:05 ` Jan Beulich
2016-02-12 17:42 ` Andrew Cooper
2016-02-15 9:20 ` Jan Beulich
2016-02-15 14:38 ` Andrew Cooper
2016-02-15 14:50 ` Jan Beulich
2016-02-15 14:53 ` Andrew Cooper
2016-02-15 15:02 ` Jan Beulich
2016-02-15 15:41 ` Andrew Cooper
2016-02-17 19:02 ` Is: PVH dom0 - MWAIT detection logic to get deeper C-states exposed in ACPI AML code. Was:Re: " Konrad Rzeszutek Wilk
2016-02-17 19:58 ` Boris Ostrovsky
2016-02-18 15:02 ` Roger Pau Monné
2016-02-18 15:12 ` Andrew Cooper
2016-02-18 16:24 ` Boris Ostrovsky
2016-02-18 16:48 ` Andrew Cooper
2016-02-18 17:03 ` Roger Pau Monné
2016-02-18 22:08 ` Konrad Rzeszutek Wilk
2016-02-18 15:16 ` David Vrabel
2016-02-05 13:42 ` [PATCH v2 11/30] xen/x86: Calculate maximum host and guest featuresets Andrew Cooper
2016-02-15 13:37 ` Jan Beulich
2016-02-15 14:57 ` Andrew Cooper
2016-02-15 15:07 ` Jan Beulich
2016-02-15 15:52 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 12/30] xen/x86: Generate deep dependencies of features Andrew Cooper
2016-02-15 14:06 ` Jan Beulich
2016-02-15 15:28 ` Andrew Cooper
2016-02-15 15:52 ` Jan Beulich
2016-02-15 16:09 ` Andrew Cooper
2016-02-15 16:27 ` Jan Beulich
2016-02-15 19:07 ` Andrew Cooper
2016-02-16 9:54 ` Jan Beulich
2016-02-17 10:25 ` Andrew Cooper
2016-02-17 10:42 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 13/30] xen/x86: Clear dependent features when clearing a cpu cap Andrew Cooper
2016-02-15 14:53 ` Jan Beulich
2016-02-15 15:33 ` Andrew Cooper
2016-02-15 14:56 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 14/30] xen/x86: Improve disabling of features which have dependencies Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks Andrew Cooper
2016-02-15 15:43 ` Jan Beulich
2016-02-15 17:12 ` Andrew Cooper
2016-02-16 10:06 ` Jan Beulich
2016-02-17 10:43 ` Andrew Cooper
2016-02-17 10:55 ` Jan Beulich
2016-02-17 14:02 ` Andrew Cooper
2016-02-17 14:45 ` Jan Beulich
2016-02-18 12:17 ` Andrew Cooper
2016-02-18 13:23 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 16/30] x86/cpu: Move set_cpumask() calls into c_early_init() Andrew Cooper
2016-02-16 14:10 ` Jan Beulich
2016-02-17 10:45 ` Andrew Cooper
2016-02-17 10:58 ` Jan Beulich
2016-02-18 12:41 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 17/30] x86/cpu: Common infrastructure for levelling context switching Andrew Cooper
2016-02-16 14:15 ` Jan Beulich
2016-02-17 8:15 ` Jan Beulich
2016-02-17 10:46 ` Andrew Cooper
2016-02-17 19:06 ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 18/30] x86/cpu: Rework AMD masking MSR setup Andrew Cooper
2016-02-17 7:40 ` Jan Beulich
2016-02-17 10:56 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 19/30] x86/cpu: Rework Intel masking/faulting setup Andrew Cooper
2016-02-17 7:57 ` Jan Beulich
2016-02-17 10:59 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 20/30] x86/cpu: Context switch cpuid masks and faulting state in context_switch() Andrew Cooper
2016-02-17 8:06 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 21/30] x86/pv: Provide custom cpumasks for PV domains Andrew Cooper
2016-02-17 8:13 ` Jan Beulich
2016-02-17 11:03 ` Andrew Cooper
2016-02-17 11:14 ` Jan Beulich
2016-02-18 12:48 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 22/30] x86/domctl: Update PV domain cpumasks when setting cpuid policy Andrew Cooper
2016-02-17 8:22 ` Jan Beulich
2016-02-17 12:13 ` Andrew Cooper
2016-02-05 13:42 ` [PATCH v2 23/30] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL Andrew Cooper
2016-02-05 16:12 ` Wei Liu
2016-02-17 8:30 ` Jan Beulich
2016-02-17 12:17 ` Andrew Cooper
2016-02-17 12:23 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 24/30] tools/libxc: Modify bitmap operations to take void pointers Andrew Cooper
2016-02-05 16:12 ` Wei Liu
2016-02-08 11:40 ` Andrew Cooper
2016-02-08 16:23 ` Tim Deegan
2016-02-08 16:36 ` Ian Campbell
2016-02-10 10:07 ` Andrew Cooper
2016-02-10 10:18 ` Ian Campbell
2016-02-18 13:37 ` Andrew Cooper
2016-02-17 20:06 ` Konrad Rzeszutek Wilk
2016-02-05 13:42 ` [PATCH v2 25/30] tools/libxc: Use public/featureset.h for cpuid policy generation Andrew Cooper
2016-02-05 16:12 ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 26/30] tools/libxc: Expose the automatically generated cpu featuremask information Andrew Cooper
2016-02-05 16:12 ` Wei Liu
2016-02-05 16:15 ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 27/30] tools: Utility for dealing with featuresets Andrew Cooper
2016-02-05 16:13 ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 28/30] tools/libxc: Wire a featureset through to cpuid policy logic Andrew Cooper
2016-02-05 16:13 ` Wei Liu
2016-02-05 13:42 ` [PATCH v2 29/30] tools/libxc: Use featuresets rather than guesswork Andrew Cooper
2016-02-05 16:13 ` Wei Liu
2016-02-17 8:55 ` Jan Beulich
2016-02-17 13:03 ` Andrew Cooper
2016-02-17 13:19 ` Jan Beulich
2016-02-05 13:42 ` [PATCH v2 30/30] tools/libxc: Calculate xstate cpuid leaf from guest information Andrew Cooper
2016-02-05 14:28 ` Jan Beulich
2016-02-05 15:22 ` Andrew Cooper
2016-02-08 17:26 ` [PATCH v2.5 31/30] Fix PV guest XSAVE handling with levelling Andrew Cooper
2016-02-17 9:02 ` Jan Beulich
2016-02-17 13:06 ` Andrew Cooper
2016-02-17 13:36 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56CB586B.1080702@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).