[Qemu-devel] Qemu, libvirt, and CPU models

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Qemu, libvirt, and CPU models
@ 2012-03-06 18:27 Eduardo Habkost
  2012-03-07 14:18 ` [Qemu-devel] [libvirt] " Daniel P. Berrange
  0 siblings, 1 reply; 8+ messages in thread
From: Eduardo Habkost @ 2012-03-06 18:27 UTC (permalink / raw)
  To: libvir-list, qemu-devel

Hi,

Sorry for the long message, but I didn't find a way to summarize the
questions and issues and make it shorter.

For people who don't know me: I have started to work recently on the
Qemu CPU model code. I have been looking at how things work on
libvirt+Qemu today w.r.t. CPU models, and I have some points I would
like to understand better and see if they can be improved.

I have two main points I would like to understand/discuss:

1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
   definitions.

2) How we could properly allow CPU models to be changed without breaking
   existing virtual machines?

Note that for all the questions below, I don't expect that we design the
whole solution and discuss every single detail in this thread. I just
want to collectn suggestions, information about libvirt requirements and
assumptions, and warnings about expected pitfalls before I start working
on a solution on Qemu.

1) Qemu and cpu_map.xml

I would like to understand how cpu_map.xml is supposed to be used, and
how it is supposed to interact with the CPU model definitions provided
by Qemu. More precisely:

1.1) Do we want to eliminate the duplication between the Qemu CPU
  definitions and cpu_map.xml?

1.1.1) If we want to eliminate the duplication, how can we accomplish
  that? What interfaces you miss, that Qemu could provide?

1.1.2) If the duplication has a purpose and you want to keep
  cpu_map.xml, then:
  - First, I would like to understand why libvirt needs cpu_map.xml? Is
    it part of the "public" interface of libvirt, or is it just an
    internal file where libvirt stores non-user-visible data?
  - How can we make sure there is no confusion between libvirt and Qemu
    about the CPU models? For example, what if cpu_map.xml says model
    'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
    guarantee that libvirt gets exactly what it expects from Qemu when
    it asks for a CPU model? We have "-cpu ?dump" today, but it's not
    the better interface we could have. Do you miss something in special
    in the Qemu<->libvirt interface, to help on that?

1.2) About the probing of available features on the host system: Qemu
  has code specialized to query KVM about the available features, and to
  check what can be enabled and what can't be enabled in a VM. On many
  cases, the available features match exactly what is returned by the
  CPUID instruction on the host system, but there are some
  exceptions:
  - Some features can be enabled even when the host CPU doesn't support
    it (because they are completely emulated by KVM, e.g. x2apic).
  - On many other cases, the feature may be available but we have to
    check if Qemu+KVM are really able to expose it to the guest (many
    features work this way, as many depend on specific support by the
    KVM kernel module and/or Qemu).

  I suppose libvirt does want to check which flags can be enabled in a
  VM, as it already have checks for host CPU features (e.g.
  src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
  doesn't want to duplicate the KVM feature probing code present on
  Qemu, and in this case we could have an interface where libvirt could
  query for the actually-available CPU features. Would it be useful for
  libvirt? What's the best way to expose this interface?

1.3) Some features are not plain CPU feature bits: e.g. level=X can be
  set in "-cpu" argument, and other features are enabled/disabled by
  exposing specific CPUID leafs and not just a feature bit (e.g. PMU
  CPUID leaf support). I suppose libvirt wants to be able to probe for
  those features too, and be able to enable/disable them, right?

2) How to change an existing model and keep existing VMs working?

Sometimes we have to update a CPU model definition because of some bug.
Eamples:

- The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
  works most times, but it breaks CPU core/thread topology enumeration.
  We have to change those CPU models to use level=4 to fix the bug.

- This can happen with plain CPU feature bits, too, not just "level":
  sometimes real-world CPU models have a feature that is not supported
  by Qemu+KVM yet, but when the kernel and Qemu finally starts to
  support it, we may want to enable it on existing CPU models. Sometimes
  a model simply has the wrong set of feature bits, and we have to fix
  it to have the right set of features.

But if we simply change the existing model definition, this will break
existing machines:

- Today, it would break on live migration, but that's slightly easy to
  fix: we have to migrate the CPUID information too, to make sure we
  won't change the CPU under the guest OS feet.

- Even if we fix live migration, simple "cold" migration will make the
  guest OS see a different CPU after a reboot, and that's undesirable
  too. Even if the Qemu developers disagree with me and decide that this
  is not a problem, libvirt may want to expose a more stable CPU to the
  guest, and some cooperation from Qemu would be ncessary.

So, my questions are:

About the libvirt<->Qemu interface:

2.1) What's the best mechanism to have different versions of a CPU
  model? An alias system like the one used by machine-types? How to
  implement this without confusing the existing libvirt probing code?

2.2) We have to make the CPU model version-choosing mechanism depend on
  the machine-type. e.g. if the user has a pc-1.0 machine using the
  Nehalem CPU model, we have to keep using the level=2 version of that
  CPU. But if the user chose a newer machine-type version, we can safely
  get the latest-and-greates version of the Nehalem CPU model. How to
  make this work without confusing libvirt?

About the user<->libvirt interface:

2.3) How all this will interact with cpu_map.xml? Right now there's the
  assumption that the CPU model definitions are immutable, right?

2.4) How do you think libvirt would expose this "CPU model version"
  to the user? Should it just expose the unversioned CPU models to the
  user, and let Qemu or libvirt choose the right version based on
  machine-type?  Should it expose only the versioned CPU models (because
  they are immutable) and never expose the unversioned aliases? Should
  it expose the unversioned alias, but change the Domain XML definition
  automatically to the versioned immutable one (like it happens with
  machine-type)?

I don't plan to interfere on the libvirt interface design, but I suppose
that libvirt design assumptions will be impacted by the solution we
choose on Qemu. For example: right now libvirt seems to assume that CPU
models are immutable. Are you going to keep this assumption in the
libvirt interfaces? Because I am already willing to break this
assumption on Qemu, although I would like to cooperate with libvirt and
not break any requirements/assumptions without warning.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt] Qemu, libvirt, and CPU models
  2012-03-06 18:27 [Qemu-devel] Qemu, libvirt, and CPU models Eduardo Habkost
@ 2012-03-07 14:18 ` Daniel P. Berrange
  2012-03-07 22:26   ` Eduardo Habkost
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrange @ 2012-03-07 14:18 UTC (permalink / raw)
  To: libvir-list, qemu-devel

On Tue, Mar 06, 2012 at 03:27:53PM -0300, Eduardo Habkost wrote:
> Hi,
> 
> Sorry for the long message, but I didn't find a way to summarize the
> questions and issues and make it shorter.
> 
> For people who don't know me: I have started to work recently on the
> Qemu CPU model code. I have been looking at how things work on
> libvirt+Qemu today w.r.t. CPU models, and I have some points I would
> like to understand better and see if they can be improved.
> 
> I have two main points I would like to understand/discuss:
> 
> 1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
>    definitions.

We have several areas of code in which we use CPU definitions

 - Reporting the host CPU definition (virsh capabilities)
 - Calculating host CPU compatibility / baseline definitions
 - Checking guest / host CPU compatibility
 - Configuring the guest CPU definition

libvirt targets multiple platforms, and our CPU handling code is designed
to be common & sharable across all the libvirt drivers, VMWare, Xen, KVM,
LXC, etc. Obviously for container based virt, only the host side of things
is relevant.

The libvirt CPU XML definition consists of

 - Model name
 - Vendor name
 - zero or more feature flags added/removed.

A model name is basically just an alias for a bunch of feature flags,
so that the CPU XML definitions are a) reasonably short b) have
some sensible default baselines.

The cpu_map.xml is the database of the CPU models that libvirt
supports. We use this database to transform the CPU definition
from the guest XML, into the hypervisor's own format.

As luck would have it, the cpu_map.xml file contents match what
QEMU has. This need not be the case though. If there is a model
in the libvirt cpu_map.xml that QEMU doesn't know, we'll just
pick the nearest matching QEMU cpu model & specify the fature
flags to compensate. We could go one step further and just write
out a cpu.conf file that we load in QEMU with -loadconfig.

On Xen we would use the cpu_map.xml to generate the CPUID
masks that Xen expects. Similarly for VMWare.

> 2) How we could properly allow CPU models to be changed without breaking
>    existing virtual machines?

What is the scope of changes expected to CPU models ?

> 1) Qemu and cpu_map.xml
> 
> I would like to understand how cpu_map.xml is supposed to be used, and
> how it is supposed to interact with the CPU model definitions provided
> by Qemu. More precisely:
> 
> 1.1) Do we want to eliminate the duplication between the Qemu CPU
>   definitions and cpu_map.xml?

It isn't possible for us to the libvirt cpu_map.xml, since we
need that across all our hypervisor targets.

> 1.1.1) If we want to eliminate the duplication, how can we accomplish
>   that? What interfaces you miss, that Qemu could provide?
> 
> 1.1.2) If the duplication has a purpose and you want to keep
>   cpu_map.xml, then:
>   - First, I would like to understand why libvirt needs cpu_map.xml? Is
>     it part of the "public" interface of libvirt, or is it just an
>     internal file where libvirt stores non-user-visible data?
>   - How can we make sure there is no confusion between libvirt and Qemu
>     about the CPU models? For example, what if cpu_map.xml says model
>     'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
>     guarantee that libvirt gets exactly what it expects from Qemu when
>     it asks for a CPU model? We have "-cpu ?dump" today, but it's not
>     the better interface we could have. Do you miss something in special
>     in the Qemu<->libvirt interface, to help on that?
> 
> 1.2) About the probing of available features on the host system: Qemu
>   has code specialized to query KVM about the available features, and to
>   check what can be enabled and what can't be enabled in a VM. On many
>   cases, the available features match exactly what is returned by the
>   CPUID instruction on the host system, but there are some
>   exceptions:
>   - Some features can be enabled even when the host CPU doesn't support
>     it (because they are completely emulated by KVM, e.g. x2apic).
>   - On many other cases, the feature may be available but we have to
>     check if Qemu+KVM are really able to expose it to the guest (many
>     features work this way, as many depend on specific support by the
>     KVM kernel module and/or Qemu).
>   
>   I suppose libvirt does want to check which flags can be enabled in a
>   VM, as it already have checks for host CPU features (e.g.
>   src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
>   doesn't want to duplicate the KVM feature probing code present on
>   Qemu, and in this case we could have an interface where libvirt could
>   query for the actually-available CPU features. Would it be useful for
>   libvirt? What's the best way to expose this interface?
> 
> 1.3) Some features are not plain CPU feature bits: e.g. level=X can be
>   set in "-cpu" argument, and other features are enabled/disabled by
>   exposing specific CPUID leafs and not just a feature bit (e.g. PMU
>   CPUID leaf support). I suppose libvirt wants to be able to probe for
>   those features too, and be able to enable/disable them, right?


The libvirt CPU definition does not currently store info about the
level, family, model, stepping, xlevel or model_id items. We really
ought to fix this, so that libvirt does have that info. Then we'd
be able to write out a QEMU config that fully specified the exact
model.

> 2) How to change an existing model and keep existing VMs working?
> 
> Sometimes we have to update a CPU model definition because of some bug.
> Eamples:
> 
> - The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
>   works most times, but it breaks CPU core/thread topology enumeration.
>   We have to change those CPU models to use level=4 to fix the bug.

This is an example of why libvirt needs to represent the level/family
etc in its CPU definition. That way, when a guest is first created,
the XML will save the CPU model, feature flags, level, family, etc
it is created with. Should the level be changed later, existing guests
would then not be affected, only new guests would get the level=4

> - This can happen with plain CPU feature bits, too, not just "level":
>   sometimes real-world CPU models have a feature that is not supported
>   by Qemu+KVM yet, but when the kernel and Qemu finally starts to
>   support it, we may want to enable it on existing CPU models. Sometimes
>   a model simply has the wrong set of feature bits, and we have to fix
>   it to have the right set of features.

> 2.3) How all this will interact with cpu_map.xml? Right now there's the
>   assumption that the CPU model definitions are immutable, right?
> 
> 2.4) How do you think libvirt would expose this "CPU model version"
>   to the user? Should it just expose the unversioned CPU models to the
>   user, and let Qemu or libvirt choose the right version based on
>   machine-type?  Should it expose only the versioned CPU models (because
>   they are immutable) and never expose the unversioned aliases? Should
>   it expose the unversioned alias, but change the Domain XML definition
>   automatically to the versioned immutable one (like it happens with
>   machine-type)?

We should only expose unversioned CPU models, but then record the
precise details of the current version in the guest XML.

> I don't plan to interfere on the libvirt interface design, but I suppose
> that libvirt design assumptions will be impacted by the solution we
> choose on Qemu. For example: right now libvirt seems to assume that CPU
> models are immutable. Are you going to keep this assumption in the
> libvirt interfaces? Because I am already willing to break this
> assumption on Qemu, although I would like to cooperate with libvirt and
> not break any requirements/assumptions without warning.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt] Qemu, libvirt, and CPU models
  2012-03-07 14:18 ` [Qemu-devel] [libvirt] " Daniel P. Berrange
@ 2012-03-07 22:26   ` Eduardo Habkost
  2012-03-07 23:07     ` Eric Blake
  2012-03-08 13:41     ` Jiri Denemark
  0 siblings, 2 replies; 8+ messages in thread
From: Eduardo Habkost @ 2012-03-07 22:26 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: libvir-list, qemu-devel

Thanks a lot for the explanations, Daniel.

Comments about specific items inline.

On Wed, Mar 07, 2012 at 02:18:28PM +0000, Daniel P. Berrange wrote:
> > I have two main points I would like to understand/discuss:
> > 
> > 1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
> >    definitions.
> 
> We have several areas of code in which we use CPU definitions
> 
>  - Reporting the host CPU definition (virsh capabilities)
>  - Calculating host CPU compatibility / baseline definitions
>  - Checking guest / host CPU compatibility
>  - Configuring the guest CPU definition
> 
> libvirt targets multiple platforms, and our CPU handling code is designed
> to be common & sharable across all the libvirt drivers, VMWare, Xen, KVM,
> LXC, etc. Obviously for container based virt, only the host side of things
> is relevant.
> 
> The libvirt CPU XML definition consists of
> 
>  - Model name
>  - Vendor name
>  - zero or more feature flags added/removed.
> 
> A model name is basically just an alias for a bunch of feature flags,
> so that the CPU XML definitions are a) reasonably short b) have
> some sensible default baselines.
> 
> The cpu_map.xml is the database of the CPU models that libvirt
> supports. We use this database to transform the CPU definition
> from the guest XML, into the hypervisor's own format.

Understood. Makes sense.

> 
> As luck would have it, the cpu_map.xml file contents match what
> QEMU has. This need not be the case though. If there is a model
> in the libvirt cpu_map.xml that QEMU doesn't know, we'll just
> pick the nearest matching QEMU cpu model & specify the fature
> flags to compensate.

Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and
add the necessary flags? That was my main worry. If disagreement between
Qemu and libvirt is not a problem, it would make things much easier.

...but:

Is that really implemented? I simply don't see libvirt doing that. I see
code that calls "-cpu ?" to list the available CPU models, but no code
calling "-cpu ?dump", or parsing the Qemu CPU definition config file. I
even removed some random flags from the Nehalem model on my machine
(running Fedora 16), and no additional flags were added.


> We could go one step further and just write
> out a cpu.conf file that we load in QEMU with -loadconfig.

Sounds good. Anyway, I want to make everything configurable on the
cpudef config file configurable on the command-line too, so both options
(command-line or config file) would work.

> 
> On Xen we would use the cpu_map.xml to generate the CPUID
> masks that Xen expects. Similarly for VMWare.
> 
> > 2) How we could properly allow CPU models to be changed without breaking
> >    existing virtual machines?
> 
> What is the scope of changes expected to CPU models ?

We already have at least four cases, affecting different fields of the
CPU definitions:

A) Adding/removing flags. Exampes:
   - When the current set of flags is simply incorrect. See commit df07ec56
     on qemu.git, where lots of flags that weren't supposed to be set
     were removed from some models.
   - When a new feature is now supported by Qemu+KVM and it's present
     on real-world CPUs, but our CPU definitions don't have the feature
     yet. e.g. x2apic, that is present on real-world Westmere CPUs but
     disabled on Qemu Westmere CPU definition.
B) Changing "level" for some reason. One example: Conroe, Penrym and
   Nehalem have level=2, but need to have level>=4 to make CPU topology
   work, so they have to be changed.
C) Enabling/disabling or overriding specific CPUID leafs. This isn't
   even configurable on the config files today, but I plan to allow it
   to be configured, otherwise users won't be able to enable/disable
   some features that are probed by the guest by simply looking at a
   CPUID leaf (e.g. the 0xA CPUID leaf that contains PMU information).

The PMU leaf is an example where a CPU looks different by simply using a
different Qemu or kernel version, and libvirt can't control the
visibility of that feature to the guest:

- If you start a Virtual Machine using Qemu-1.0 today, with the "pc-1.0"
  machine-type, the PMU CPUID leaf won't be visible to the guest
  (as Qemu-1.0 doesn't support the PMU leaf).

- If you start a Virtual Machine using Qemu-1.1 in the future, using the
  "pc-1.1" machine-type, with a recent kernel, the PMU CPUID leaf _will_
  be visible to the guest (as the qemu.git master branch supports it).

Up to now, it is OK because the machine-type in theory help us control
the feature, but we have a problem on this case:

- If you start a Virtual Machine using Qemu-1.1 in the future, using the
  "pc-1.1" machine-type, using exactly the same command-line as above,
  but using an old kernel, the PMU CPUID leaf will _not_ be visible to
  the guest.


> 
> > 1) Qemu and cpu_map.xml
> > 
> > I would like to understand how cpu_map.xml is supposed to be used, and
> > how it is supposed to interact with the CPU model definitions provided
> > by Qemu. More precisely:
> > 
> > 1.1) Do we want to eliminate the duplication between the Qemu CPU
> >   definitions and cpu_map.xml?
> 
> It isn't possible for us to the libvirt cpu_map.xml, since we
> need that across all our hypervisor targets.

OK, as you already explained. It's not a problem to me as long as things
work as expected when Qemu and libvirt disagree about a CPU model
definition.


So, about the specific questions:

> > 1.1.1) If we want to eliminate the duplication, how can we accomplish
> >   that? What interfaces you miss, that Qemu could provide?
> > 
> > 1.1.2) If the duplication has a purpose and you want to keep
> >   cpu_map.xml, then:
> >   - First, I would like to understand why libvirt needs cpu_map.xml? Is
> >     it part of the "public" interface of libvirt, or is it just an
> >     internal file where libvirt stores non-user-visible data?

You answered that above.

> >   - How can we make sure there is no confusion between libvirt and Qemu
> >     about the CPU models? For example, what if cpu_map.xml says model
> >     'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
> >     guarantee that libvirt gets exactly what it expects from Qemu when
> >     it asks for a CPU model? We have "-cpu ?dump" today, but it's not
> >     the better interface we could have. Do you miss something in special
> >     in the Qemu<->libvirt interface, to help on that?

So, it looks like either I am missing something on my tests or libvirt
is _not_ probing the Qemu CPU model definitions to make sure libvirt
gets all the features it expects.

Also, I would like to ask if you have suggestions to implement
the equivalent of "-cpu ?dump" in a more friendly and extensible way.
Would a QMP command be a good alternative? Would a command-line option
with json output be good enough?

(Do we have any case of capability-querying being made using QMP before
starting any actual VM, today?)


> > 1.2) About the probing of available features on the host system: Qemu
> >   has code specialized to query KVM about the available features, and to
> >   check what can be enabled and what can't be enabled in a VM. On many
> >   cases, the available features match exactly what is returned by the
> >   CPUID instruction on the host system, but there are some
> >   exceptions:
> >   - Some features can be enabled even when the host CPU doesn't support
> >     it (because they are completely emulated by KVM, e.g. x2apic).
> >   - On many other cases, the feature may be available but we have to
> >     check if Qemu+KVM are really able to expose it to the guest (many
> >     features work this way, as many depend on specific support by the
> >     KVM kernel module and/or Qemu).
> >   
> >   I suppose libvirt does want to check which flags can be enabled in a
> >   VM, as it already have checks for host CPU features (e.g.
> >   src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
> >   doesn't want to duplicate the KVM feature probing code present on
> >   Qemu, and in this case we could have an interface where libvirt could
> >   query for the actually-available CPU features. Would it be useful for
> >   libvirt? What's the best way to expose this interface?

So, about the above: the cases where libvirt thinks a feature is
available but Qemu knows it is not available are sort-of OK today,
because Qemu would simply refuse to start and an error message would be
returned to the user.

But what about the features that are not available on the host CPU,
libvirt will think it can't be enabled, but that _can_ be enabled?
x2apic seems to be the only case today, but we may have others in the
future.


> > 
> > 1.3) Some features are not plain CPU feature bits: e.g. level=X can be
> >   set in "-cpu" argument, and other features are enabled/disabled by
> >   exposing specific CPUID leafs and not just a feature bit (e.g. PMU
> >   CPUID leaf support). I suppose libvirt wants to be able to probe for
> >   those features too, and be able to enable/disable them, right?
> 
> 
> The libvirt CPU definition does not currently store info about the
> level, family, model, stepping, xlevel or model_id items. We really
> ought to fix this, so that libvirt does have that info. Then we'd
> be able to write out a QEMU config that fully specified the exact
> model.

OK, good to know that this is being planned.

> 
> > 2) How to change an existing model and keep existing VMs working?
> > 
> > Sometimes we have to update a CPU model definition because of some bug.
> > Eamples:
> > 
> > - The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
> >   works most times, but it breaks CPU core/thread topology enumeration.
> >   We have to change those CPU models to use level=4 to fix the bug.
> 
> This is an example of why libvirt needs to represent the level/family
> etc in its CPU definition. That way, when a guest is first created,
> the XML will save the CPU model, feature flags, level, family, etc
> it is created with. Should the level be changed later, existing guests
> would then not be affected, only new guests would get the level=4

Correct.

> 
> > - This can happen with plain CPU feature bits, too, not just "level":
> >   sometimes real-world CPU models have a feature that is not supported
> >   by Qemu+KVM yet, but when the kernel and Qemu finally starts to
> >   support it, we may want to enable it on existing CPU models. Sometimes
> >   a model simply has the wrong set of feature bits, and we have to fix
> >   it to have the right set of features.
> 
> > 2.3) How all this will interact with cpu_map.xml? Right now there's the
> >   assumption that the CPU model definitions are immutable, right?
> > 
> > 2.4) How do you think libvirt would expose this "CPU model version"
> >   to the user? Should it just expose the unversioned CPU models to the
> >   user, and let Qemu or libvirt choose the right version based on
> >   machine-type?  Should it expose only the versioned CPU models (because
> >   they are immutable) and never expose the unversioned aliases? Should
> >   it expose the unversioned alias, but change the Domain XML definition
> >   automatically to the versioned immutable one (like it happens with
> >   machine-type)?
> 
> We should only expose unversioned CPU models, but then record the
> precise details of the current version in the guest XML.

Sounds good to me.

That answers most of my questions about how libvirt would handle changes
on CPU models. Now we need good mechanisms that allow libvirt to do
that. If you have specific requirements or suggestions in mind, please
let me know.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt]   Qemu, libvirt, and CPU models
  2012-03-07 22:26   ` Eduardo Habkost
@ 2012-03-07 23:07     ` Eric Blake
  2012-03-08 13:01       ` Lee Schermerhorn
  2012-03-08 13:59       ` Eduardo Habkost
  2012-03-08 13:41     ` Jiri Denemark
  1 sibling, 2 replies; 8+ messages in thread
From: Eric Blake @ 2012-03-07 23:07 UTC (permalink / raw)
  To: Daniel P. Berrange, libvir-list, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]

On 03/07/2012 03:26 PM, Eduardo Habkost wrote:
> Thanks a lot for the explanations, Daniel.
> 
> Comments about specific items inline.
> 

>>>   - How can we make sure there is no confusion between libvirt and Qemu
>>>     about the CPU models? For example, what if cpu_map.xml says model
>>>     'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
>>>     guarantee that libvirt gets exactly what it expects from Qemu when
>>>     it asks for a CPU model? We have "-cpu ?dump" today, but it's not
>>>     the better interface we could have. Do you miss something in special
>>>     in the Qemu<->libvirt interface, to help on that?
> 
> So, it looks like either I am missing something on my tests or libvirt
> is _not_ probing the Qemu CPU model definitions to make sure libvirt
> gets all the features it expects.
> 
> Also, I would like to ask if you have suggestions to implement
> the equivalent of "-cpu ?dump" in a more friendly and extensible way.
> Would a QMP command be a good alternative? Would a command-line option
> with json output be good enough?

I'm not sure where we are are using "-cpu ?dump", but it sounds like we
should be.

> 
> (Do we have any case of capability-querying being made using QMP before
> starting any actual VM, today?)

Right now, we have two levels of queries - the 'qemu -help' and 'qemu
-device ?' output is gathered up front (we really need to patch things
to cache that, rather than repeating it for every VM start).  Then we
start qemu with -S, query QMP, all before starting the guest (qemu -S is
in fact necessary for setting some options that cannot be set in the
current CLI but can be set via the monitor) - but right now that is the
only point where we query QMP capabilities.

If QMP can alter the CPU model prior to the initial start of the guest,
then that would be a sufficient interface.  But I'm worried that once we
start qemu, even with qemu -S, that it's too late to alter the CPU model
in use by that guest, and that libvirt should instead start querying
these things in advance.  We definitely want a machine-parseable
construct, so querying over QMP rather than '-cpu ?dump' sounds like it
might be nicer, but it would also be more work to set up libvirt to do a
dry-run query of QMP capabilities without also starting a real guest.

> 
> But what about the features that are not available on the host CPU,
> libvirt will think it can't be enabled, but that _can_ be enabled?
> x2apic seems to be the only case today, but we may have others in the
> future.

That's where having an interface to probe qemu to see what capabilities
are possible for any given cpu model would be worthwhile, so that
libvirt can correlate the feature sets properly.

> 
> That answers most of my questions about how libvirt would handle changes
> on CPU models. Now we need good mechanisms that allow libvirt to do
> that. If you have specific requirements or suggestions in mind, please
> let me know.

I'll let others chime in with more responses, but I do appreciate you
taking the time to coordinate this.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt]   Qemu, libvirt, and CPU models
  2012-03-07 23:07     ` Eric Blake
@ 2012-03-08 13:01       ` Lee Schermerhorn
  2012-03-08 13:59       ` Eduardo Habkost
  1 sibling, 0 replies; 8+ messages in thread
From: Lee Schermerhorn @ 2012-03-08 13:01 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, qemu-devel

On Wed, 2012-03-07 at 16:07 -0700, Eric Blake wrote:
> On 03/07/2012 03:26 PM, Eduardo Habkost wrote:
> > Thanks a lot for the explanations, Daniel.
> > 
> > Comments about specific items inline.
> > 
> 
> >>>   - How can we make sure there is no confusion between libvirt and Qemu
> >>>     about the CPU models? For example, what if cpu_map.xml says model
> >>>     'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
> >>>     guarantee that libvirt gets exactly what it expects from Qemu when
> >>>     it asks for a CPU model? We have "-cpu ?dump" today, but it's not
> >>>     the better interface we could have. Do you miss something in special
> >>>     in the Qemu<->libvirt interface, to help on that?
> > 
> > So, it looks like either I am missing something on my tests or libvirt
> > is _not_ probing the Qemu CPU model definitions to make sure libvirt
> > gets all the features it expects.
> > 
> > Also, I would like to ask if you have suggestions to implement
> > the equivalent of "-cpu ?dump" in a more friendly and extensible way.
> > Would a QMP command be a good alternative? Would a command-line option
> > with json output be good enough?
> 
> I'm not sure where we are are using "-cpu ?dump", but it sounds like we
> should be.
> 
> > 
> > (Do we have any case of capability-querying being made using QMP before
> > starting any actual VM, today?)
> 
> Right now, we have two levels of queries - the 'qemu -help' and 'qemu
> -device ?' output is gathered up front (we really need to patch things
> to cache that, rather than repeating it for every VM start).

Eric:

In addition to VM start, it appears that the libvirt qemu driver also
runs both the 32-bit and 64-bit qemu binaries 3 times each when fetching
capabilities that appears to occur when fetching VM state.  Noticed this
on an openstack/nova compute node that queries vm state periodically.
Seemed to be taking a long time.  stracing libvirtd during these queries
showed this sequence for each query:

6461  17:15:25.269464 execve("/usr/bin/qemu", ["/usr/bin/qemu", "-cpu", "?"], [/* 2 vars */]) = 0
6462  17:15:25.335300 execve("/usr/bin/qemu", ["/usr/bin/qemu", "-help"], [/* 2 vars */]) = 0
6463  17:15:25.393786 execve("/usr/bin/qemu", ["/usr/bin/qemu", "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?"], [/* 2 vars */]) = 0
6466  17:15:25.841086 execve("/usr/bin/qemu-system-x86_64", ["/usr/bin/qemu-system-x86_64", "-cpu", "?"], [/* 2 vars */]) = 0
6468  17:15:25.906746 execve("/usr/bin/qemu-system-x86_64", ["/usr/bin/qemu-system-x86_64", "-help"], [/* 2 vars */]) = 0
6469  17:15:25.980520 execve("/usr/bin/qemu-system-x86_64", ["/usr/bin/qemu-system-x86_64", "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?"], [/* 2 vars */]) = 0

Seems to add about a second per VM running on the host.  The periodic
scan thus takes a couple of minutes on a heavily loaded host -- several
10s of VMs.  Not a killer, but we'd like to eliminate it.

I see that libvirt does some level of caching of capabilities, checking
the st_mtime of the binaries to detect changes.  I haven't figured out
when that caching comes into effect, but it doesn't prevent the execs
above.  So, I created a patch series that caches the results of parsing
the output of these calls that I will post shortly for RFC.  It
eliminates most of such execs.  I think it might obviate the existing
capabilities caching, but I'm not sure.  Haven't had time to look into
it.

Later,
Lee Schermerhorn
HPCS


> Then we
> start qemu with -S, query QMP, all before starting the guest (qemu -S is
> in fact necessary for setting some options that cannot be set in the
> current CLI but can be set via the monitor) - but right now that is the
> only point where we query QMP capabilities.
<snip>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt] Qemu, libvirt, and CPU models
  2012-03-07 22:26   ` Eduardo Habkost
  2012-03-07 23:07     ` Eric Blake
@ 2012-03-08 13:41     ` Jiri Denemark
  2012-03-09 17:37       ` Eduardo Habkost
  1 sibling, 1 reply; 8+ messages in thread
From: Jiri Denemark @ 2012-03-08 13:41 UTC (permalink / raw)
  To: Daniel P. Berrange, libvir-list, qemu-devel

On Wed, Mar 07, 2012 at 19:26:25 -0300, Eduardo Habkost wrote:
> Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and
> add the necessary flags? That was my main worry. If disagreement between
> Qemu and libvirt is not a problem, it would make things much easier.
> 
> ...but:
> 
> Is that really implemented? I simply don't see libvirt doing that. I see
> code that calls "-cpu ?" to list the available CPU models, but no code
> calling "-cpu ?dump", or parsing the Qemu CPU definition config file. I
> even removed some random flags from the Nehalem model on my machine
> (running Fedora 16), and no additional flags were added.

Right, currently we only detect if Qemu knows requested CPU model and use
another one if not. We should really start using something like -cpu ?dump.
However, since qemu may decide to change parts of the model according to,
e.g., machine type, we would need something more dynamic. Something like, "hey
Qemu, this is the machine type and CPU model we want to use, these are the
features we want in this model, and we also want few additional features,
please, tell us what the resulting CPU configuration is (if it is even
possible to deliver such CPU on current host)". And the result would be
complete CPU model, which may of course be different from what the qemu's
configuration file says. We could then use the result to update domain XML (in
a way similar to how we handle machine types) so that we can guarantee the
guest will always see the same CPU. Once CPU is updated, we could just check
with Qemu if it can provide such CPU and start (or refuse to start) the
domain. Does it seem reasonable?

> > We could go one step further and just write
> > out a cpu.conf file that we load in QEMU with -loadconfig.
> 
> Sounds good. Anyway, I want to make everything configurable on the
> cpudef config file configurable on the command-line too, so both options
> (command-line or config file) would work.

I'd be afraid of hitting the command line length limit if we specified all CPU
details in it :-)

> So, it looks like either I am missing something on my tests or libvirt
> is _not_ probing the Qemu CPU model definitions to make sure libvirt
> gets all the features it expects.
> 
> Also, I would like to ask if you have suggestions to implement
> the equivalent of "-cpu ?dump" in a more friendly and extensible way.
> Would a QMP command be a good alternative? Would a command-line option
> with json output be good enough?

I quite like the possible solution Anthony (or perhaps someone else) suggested
some time ago (it may however be biased by my memory): qemu could provide a
command line option that would take QMP command(s) and the result would be QMP
response on stdout. We could use this interface for all kinds of probes with
easily parsed output.

> (Do we have any case of capability-querying being made using QMP before
> starting any actual VM, today?)

Not really. We only query QMP while for available QMP commands that we can
used further on when the domain is running.

> So, about the above: the cases where libvirt thinks a feature is
> available but Qemu knows it is not available are sort-of OK today,
> because Qemu would simply refuse to start and an error message would be
> returned to the user.

Really? In my experience qemu just ignored the feature it didn't know about
without any error message and started the domain happily. It might be because
libvirt doesn't use anything like -cpu ...,check or whatever is needed to make
it fail. However, I think we should fix it.

> But what about the features that are not available on the host CPU,
> libvirt will think it can't be enabled, but that _can_ be enabled?
> x2apic seems to be the only case today, but we may have others in the
> future.

I think qemu could tell us about those features during the probe phase (my
first paragraph) and we would either use them with policy='force' or something
similar.

Jirka

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt]   Qemu, libvirt, and CPU models
  2012-03-07 23:07     ` Eric Blake
  2012-03-08 13:01       ` Lee Schermerhorn
@ 2012-03-08 13:59       ` Eduardo Habkost
  1 sibling, 0 replies; 8+ messages in thread
From: Eduardo Habkost @ 2012-03-08 13:59 UTC (permalink / raw)
  To: Eric Blake; +Cc: libvir-list, qemu-devel

On Wed, Mar 07, 2012 at 04:07:06PM -0700, Eric Blake wrote:
> > 
> > (Do we have any case of capability-querying being made using QMP before
> > starting any actual VM, today?)
> 
> Right now, we have two levels of queries - the 'qemu -help' and 'qemu
> -device ?' output is gathered up front (we really need to patch things
> to cache that, rather than repeating it for every VM start).

That's what I feared. I was wondering if we had a better
machine-friendly interface to make some of these queries, today.

> Then we
> start qemu with -S, query QMP, all before starting the guest (qemu -S is
> in fact necessary for setting some options that cannot be set in the
> current CLI but can be set via the monitor) - but right now that is the
> only point where we query QMP capabilities.
> 
> If QMP can alter the CPU model prior to the initial start of the guest,
> then that would be a sufficient interface.  But I'm worried that once we
> start qemu, even with qemu -S, that it's too late to alter the CPU model
> in use by that guest, and that libvirt should instead start querying
> these things in advance.

This is probably true, and I don't see this being changed in the near
future.

Even if we fix that for CPU initialization, there are many other
initialization steps involved that would have to be reworked to allow
all capability querying to be made to the same Qemu process that would
run the VM later.

> We definitely want a machine-parseable
> construct, so querying over QMP rather than '-cpu ?dump' sounds like it
> might be nicer, but it would also be more work to set up libvirt to do a
> dry-run query of QMP capabilities without also starting a real guest.

On the other hand, with QMP we would have a better interface that could
be used for all other queries libvirt has to run. Instead of running
Qemu multiple times for capability querying, just start a single Qemu
process and make the capability queries using QMP. I don't know if this
was discussed or considered before.

> > 
> > But what about the features that are not available on the host CPU,
> > libvirt will think it can't be enabled, but that _can_ be enabled?
> > x2apic seems to be the only case today, but we may have others in the
> > future.
> 
> That's where having an interface to probe qemu to see what capabilities
> are possible for any given cpu model would be worthwhile, so that
> libvirt can correlate the feature sets properly.

Yes. The issue currently is that many things don't depend just on static
CPU model or machine-type definitions, libvirt has to know what
capabilities the kernel provides and Qemu will really be able to use.

It will be a long way to fix this. Some features are simply not
configurable yet, even on the command-line. They are just automatically
used by Qemu when provided by the kernel.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [libvirt] Qemu, libvirt, and CPU models
  2012-03-08 13:41     ` Jiri Denemark
@ 2012-03-09 17:37       ` Eduardo Habkost
  0 siblings, 0 replies; 8+ messages in thread
From: Eduardo Habkost @ 2012-03-09 17:37 UTC (permalink / raw)
  To: Daniel P. Berrange, libvir-list, qemu-devel, Jiri Denemark

On Thu, Mar 08, 2012 at 02:41:54PM +0100, Jiri Denemark wrote:
> On Wed, Mar 07, 2012 at 19:26:25 -0300, Eduardo Habkost wrote:
> > Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and
> > add the necessary flags? That was my main worry. If disagreement between
> > Qemu and libvirt is not a problem, it would make things much easier.
> > 
> > ...but:
> > 
> > Is that really implemented? I simply don't see libvirt doing that. I see
> > code that calls "-cpu ?" to list the available CPU models, but no code
> > calling "-cpu ?dump", or parsing the Qemu CPU definition config file. I
> > even removed some random flags from the Nehalem model on my machine
> > (running Fedora 16), and no additional flags were added.
> 
> Right, currently we only detect if Qemu knows requested CPU model and use
> another one if not. We should really start using something like -cpu ?dump.
> However, since qemu may decide to change parts of the model according to,
> e.g., machine type, we would need something more dynamic. Something like, "hey
> Qemu, this is the machine type and CPU model we want to use, these are the
> features we want in this model, and we also want few additional features,
> please, tell us what the resulting CPU configuration is (if it is even
> possible to deliver such CPU on current host)". And the result would be
> complete CPU model, which may of course be different from what the qemu's
> configuration file says. We could then use the result to update domain XML (in
> a way similar to how we handle machine types) so that we can guarantee the
> guest will always see the same CPU. Once CPU is updated, we could just check
> with Qemu if it can provide such CPU and start (or refuse to start) the
> domain. Does it seem reasonable?

Absolutely.

I would even advise libvirt to refrain from using "-cpu ?dump", as its
semantics are likely to change.

> > > We could go one step further and just write
> > > out a cpu.conf file that we load in QEMU with -loadconfig.
> > 
> > Sounds good. Anyway, I want to make everything configurable on the
> > cpudef config file configurable on the command-line too, so both options
> > (command-line or config file) would work.
> 
> I'd be afraid of hitting the command line length limit if we specified all CPU
> details in it :-)

True. I am already afraid of hitting the command-line length limit with
Qemu as-is right now.  ;-)


> > So, it looks like either I am missing something on my tests or libvirt
> > is _not_ probing the Qemu CPU model definitions to make sure libvirt
> > gets all the features it expects.
> > 
> > Also, I would like to ask if you have suggestions to implement
> > the equivalent of "-cpu ?dump" in a more friendly and extensible way.
> > Would a QMP command be a good alternative? Would a command-line option
> > with json output be good enough?
> 
> I quite like the possible solution Anthony (or perhaps someone else) suggested
> some time ago (it may however be biased by my memory): qemu could provide a
> command line option that would take QMP command(s) and the result would be QMP
> response on stdout. We could use this interface for all kinds of probes with
> easily parsed output.

This is another case where command-line limits could be hit, isn't it?
Reading QMP commands from a normal chardev (a socket, or even stdio) is
already available, we just need to make sure the "query QMP without ever
initializing a machine" use-case is working and really supported by
Qemu.

> > So, about the above: the cases where libvirt thinks a feature is
> > available but Qemu knows it is not available are sort-of OK today,
> > because Qemu would simply refuse to start and an error message would be
> > returned to the user.
> 
> Really? In my experience qemu just ignored the feature it didn't know about
> without any error message and started the domain happily. It might be because
> libvirt doesn't use anything like -cpu ...,check or whatever is needed to make
> it fail. However, I think we should fix it.

Correct, I was assuming that 'enforce' was being used. I forgot that
libvirt doesn't use it today.

I really think libvirt should be using 'enforce', the only problem is
that there may be cases where an existing VM was working (but with a
result unpredictable by by libvirt), and with 'enforce' it would stop
working. This is very likely to happen when using the defualt "qemu64"
CPU model, that has some AMD-only CPUID:8000_0000h bits set, but
everybody probably expects it to work on Intel CPU hosts too.

> 
> > But what about the features that are not available on the host CPU,
> > libvirt will think it can't be enabled, but that _can_ be enabled?
> > x2apic seems to be the only case today, but we may have others in the
> > future.
> 
> I think qemu could tell us about those features during the probe phase (my
> first paragraph) and we would either use them with policy='force' or something
> similar.

Yes, that's the conclusion I was trying to reach: we really need better
CPU feature probing.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-03-09 17:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-06 18:27 [Qemu-devel] Qemu, libvirt, and CPU models Eduardo Habkost
2012-03-07 14:18 ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2012-03-07 22:26   ` Eduardo Habkost
2012-03-07 23:07     ` Eric Blake
2012-03-08 13:01       ` Lee Schermerhorn
2012-03-08 13:59       ` Eduardo Habkost
2012-03-08 13:41     ` Jiri Denemark
2012-03-09 17:37       ` Eduardo Habkost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).