From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35925)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Z7lpO-0002jW-2y
	for qemu-devel@nongnu.org; Wed, 24 Jun 2015 10:39:07 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Z7lpJ-0004uh-6n
	for qemu-devel@nongnu.org; Wed, 24 Jun 2015 10:39:01 -0400
Received: from mx1.redhat.com ([209.132.183.28]:59074)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Z7lpI-0004uU-VJ
	for qemu-devel@nongnu.org; Wed, 24 Jun 2015 10:38:57 -0400
Message-ID: <558AC0FB.6090608@redhat.com>
Date: Wed, 24 Jun 2015 16:38:51 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <20150608201835.GM3525@orkuz.home> <558951C0.3050806@suse.de>
	<20150623150828.GD3134@thinpad.lan.raisama.net>
	<20150623173048-mutt-send-email-mst@redhat.com>
	<20150623155832.GE3134@thinpad.lan.raisama.net>
	<55898637.6080804@suse.de> <20150623162555.GL30318@redhat.com>
	<20150623183115-mutt-send-email-mst@redhat.com>
	<20150623164204.GM30318@redhat.com>
	<20150623231818-mutt-send-email-mst@redhat.com>
	<20150624141651.GS3134@thinpad.lan.raisama.net>
In-Reply-To: <20150624141651.GS3134@thinpad.lan.raisama.net>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 0/2] target-i386: "custom" CPU model +
 script to dump existing CPU models
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eduardo Habkost <ehabkost@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: mimu@linux.vnet.ibm.com, qemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>, borntraeger@de.ibm.com, Igor Mammedov <imammedo@redhat.com>, Jiri Denemark <jdenemar@redhat.com>, =?windows-1252?Q?Andreas_?= =?windows-1252?Q?F=E4rber?= <afaerber@suse.de>, rth@twiddle.net


On 24/06/2015 16:16, Eduardo Habkost wrote:
> > So any single CPU flag now needs to be added in
> > - kvm
> > - qemu
> > - libvirt
> >=20
> > Next thing libvirt will decide it's a policy thing and so
> > needs to be pushed up to openstack.
>=20
> I don't think that will happen, but if they really decide do do it, why
> should we try to stop them? libvirt and OpenStack know what their users
> do/need better than us, and if they believe moving data to OpenStack
> will provide what users need, they are free to do it. I trust libvirt
> developers to do the right thing, here.

After talking to Daniel for more than 1 hour I actually think that
OpenStack's scheduler is totally broken with respect to QEMU upgrades.
For some reason they focused on CPU model changes, but actually there's
no past case where runnability actually changed with a QEMU upgrade, and
there will be no such future case; even without "enforce", QEMU
introduces new models long after all features have been available in
KVM.  At this point CPU is no different from any other device.

At the same time, OpenStack's scheduler is not trying to use a machine
type that is available on all nodes of a compute node pool.  I'm not
sure how one can be sure that migration would succeed in these
circumstances.

It's certainly okay for libvirt and OpenStack to use the host CPU
features in order to check whether a node will run a given VM.  However,
libvirt should trust that QEMU developers will not prevent a VM from
running on a previously viable host, just because you change the machine
type.

And OpenStack should _really_ use the machine type as the abstract
representation of what is compatible with what inside a node pool.  This
is what RHEV does, and I see no reason to do it differently.  In
particular there will not be excessive fragmentation of the nodes into
too many pools, because a deployment with too many active machine types
is just asking for trouble.

The possible exception are weird cases involving nested virt and
outdated QEMU on the L0 host, but even those cases can be fixed just by
having an up-to-date cpu_map.xml.

Other random notes from my chat:

1) libvirt should _not_ change the flags that the user passes via XML
just because it thinks that QEMU has those flags.  This makes it
possible for libvirt to keep cpu_map.xml up-to-date without worrying
about versioning.

2) libvirt should not add/remove flags when the user specifies
host-model (i.e. -cpu SandyBridge, not -cpu
SandyBridge,+f16c,+rdrand,+erms,+whatever).  host-model has had so many
bugs reported for it that I hope this could be done unconditionally even
if it is not backwards-compatible.  Or perhaps introduce a new name and
deprecate host-model.  I don't know.

3) regarding "enforce", there are indeed some cases where it would break:

- Haswell/Broadwell CPU model after TSX removal

- qemu64 with KVM

- pretty much everything including qemu64 with TCG

So libvirt here could allow _now_ the user to specify enforce, either
via XML or via qemu.conf (or via XML + a default specified via qemu.conf)=
.


So, I _hate_ to block a feature that is anyway useful for debugging.
And this feels too much like the typical "kernel developers don't like
systemd" rant.  But it looks like this feature is not only too easy to
misuse: there are _already_ plans for misusing it and put the whole
machine compatibility issue under the rug in OpenStack.  So I agree with
Andreas, and would prefer to have a serious discussion with the
OpenStack folks before accepting it.

Paolo