From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:48632)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1SSTJd-0001i5-FI
	for qemu-devel@nongnu.org; Thu, 10 May 2012 09:22:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1SSTJW-00048L-35
	for qemu-devel@nongnu.org; Thu, 10 May 2012 09:21:57 -0400
Received: from cantor2.suse.de ([195.135.220.15]:34731 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1SSTJV-000488-Lx
	for qemu-devel@nongnu.org; Thu, 10 May 2012 09:21:50 -0400
Message-ID: <4FABC0E5.7040509@suse.de>
Date: Thu, 10 May 2012 15:21:41 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <20120507182142.GD16951@otherpad.lan.raisama.net>
	<B4321C29-D9D7-463C-8B0E-6A58649646DC@suse.de>
	<20120508201441.GN4373@otherpad.lan.raisama.net>
	<6BF7428F-FDEF-4497-94F5-7A43BC9E1E67@suse.de>
	<20120509081404.GO15960@redhat.com>
	<0FA57537-0C33-468F-B416-AEB2487A9DFD@suse.de>
	<20120509085151.GQ15960@redhat.com>
	<C80F7A4A-665A-46F2-BF8F-33B0F0ADDADA@suse.de>
	<20120509093837.GT15960@redhat.com>
	<20120509193802.GE4373@otherpad.lan.raisama.net>
	<20120510125342.GB4296@redhat.com>
In-Reply-To: <20120510125342.GB4296@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Semantics of "-cpu host" (was Re: [PATCH 2/2]
 Expose tsc deadline timer cpuid to guest)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Gleb Natapov <gleb@redhat.com>
Cc: Andre Przywara <andre.przywara@amd.com>, Eduardo Habkost <ehabkost@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, Jan Kiszka <jan.kiszka@siemens.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>

On 05/10/2012 02:53 PM, Gleb Natapov wrote:
> On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
>> On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
>>> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
>>>> On 09.05.2012, at 10:51, Gleb Natapov wrote:
>>>>
>>>>> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>>>>>>
>>>>>> On 09.05.2012, at 10:14, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>>>
>>>>>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>>>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>>>>>>
>>>>>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>>>>>>
>>>>>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>>>>>>
>>>>>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>>>>>> expected meaning of "-cpu host")
>>>>>>>>>>>>
>>>>>>>>>>> [...]
>>>>>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>>>>>> use case I am thinking about:
>>>>>>>>>>>>
>>>>>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>>>>>> doesn't require additional userspace support)
>>>>>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>>>>>> upgrading Qemu.
>>>>>>>>>>>>
>>>>>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>>>>>> upgrade).
>>>>>>>>>>>>
>>>>>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>>>>>>
>>>>>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>>>>>> userspace support)
>>>>>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>>>>>> host", otherwise Qemu would break.
>>>>>>>>>>>>
>>>>>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>>>>>> expected semantics on the cases above.
>>>>>>>>>> Can you think of any feature that'd go into category B?
>>>>>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>>>>>> the in-kernel irqchip.
>>>>>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>>>>>>
>>>>>>> How kernel should know that userspace does not emulate it?
>>>>>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>>>>>>
>>>>>>>>> - x2apic: ditto.
>>>>>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>>>>>>
>>>>>>> Same here.
>>>>>>>
>>>>>>> Well, technically both of those features can't be implemented in
>>>>>>> userspace right now since MSRs are terminated in the kernel, but I
>>>>>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>>>>>>
>>>>> You mean terminating MSRs in kernel does not sound like the greatest
>>>>> design? I do not disagree. That is why IMO kernel can't filter out
>>>>> TSC-deadline and x2apic like you suggest.
>>>> I still don't see why it can't.
>>>>
>>>> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
>>>> Now, we implement TSC-deadline in the kernel. We still filter
>>>> TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
>>>> an interface to user space that says "call me to enable TSC-deadline
>>>> CPUID, but only if you're using the in-kernel apic"
>> We have that interface already, it is called KVM_SET_CPUID.  :-)
>>
>>>> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
>>>> Old user space doesn't call that ioctl.
>>> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
>>> any additional ioctls. And second I do not see why we need additional
>>> iostls here.
>> We don't have TSC-deadline set today (and that's what started this
>> thread), but we have x2apic. So what you say is true for x2apic, anyway.
>>
> Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)
>
>>> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
>>> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
>>> from KVM_SET_CPUID? For those two features it may make sense indeed.
>> It makes sense to me.
>>
>> It looks like my assumptions were wrong. They were:
>>
>> - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
>>    going to be enabled or not.
> I think this is currently true. Before we changing that we have to make
> sure that no existing userspace calls GET_SUPPORTED_CPUID before
> creating irqchip. Not sure we can check all existing userspaces.
>
>> - GET_SUPPORTED_CPUID output has to be a function of the kernel code
>>    capabilitie and host CPU, and not depend on any input from userspace.
>>
>> Are those assumptions incorrect? If we break them, we may try what
>> Alexander is proposing. It would be much more flexible than the options
>> I was considering.
>>
>> I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> What is ENABLE_CAP?

It's an ioctl that you pass in a CAP. It's used on non-x86 already to 
easily turn on features inside kvm :).

>
>> check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
>> be used by userpace to tell the kernel "I will enable the in-kernel
>> irqchip, so feel free to return features that depend on it on
>> GET_SUPPORTED_CPUID".
>>
>> In other words, we would return only the type-A features on
>> GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
>> long as migration is not required), but if we use ENABLE_CAP we can make
>> group A safely grow, as long as userspace first tells the kernel what it
>> supports.
>>
>> Anybody is against doing that? Otherwise I plan to work on this.
>> Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
>> unless userspace tells the kernel (using ENABLE_CAP) that it will enable
>> the in-kernel irqchip. Then we can do the same with TSC-deadline.
>>
>> Note that all this work is to allow the kernel to let userspace blindly
>> enable features it _doesn't know yet_. If we limit ourselves to features
>> userspace already knows about, we could simply remove x2apic and
>> TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
>> that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
>> if it knows it is safe (either because it checked for the corresponding
>> KVM_CAP_* capability is present and it will enable the in-kernel
>> irqchip, or because it will emulated it in userspace).
>>
> There is not KVM_CAP_* for x2apic as far as I see.
>
>> In case anybody is against the proposal above: note that the current
>> documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
>> that depend on specific userspace behavior/capabilities) is simply
>> unusable by "-cpu host". If the above proposal gets rejected, my Plan B
>> is to update the GET_SUPPORTED_CPUID documentation to note that it
>> returns only type-A features (features that userspace can safely enable
>> even if it doesn't know what it does), remove x2apic from
>> GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
>> type-B features (features that depend on specific userspace
>> capabilities/behavior).
>>
> This will break existing userspaces, no?

It would mean that new user space on old kernels don't get x2apic or 
tsc-deadline in cpuid, yes.

>
>>> Not
>>> sure there won't be others that are not dependent on irq chip presence.
>>> You propose to add additional ioctls to enable them if they appear?
>> I am sure there will be new features in the future that don't depend on
>> any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
>> unconditionally.
>>
> Those not the kind of features I am worry about though. I worry about
> the features that can be emulated either by kernel or by userspace and
> userspace may choose where it wants the feature to be emulated.

Like? Usually CPUID features reflect new instructions. We don't notify 
user space on unknown instructions, right?


Alex