From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
peterx@redhat.com, Andrew Jones <drjones@redhat.com>,
kvm@vger.kernel.org
Subject: Re: GET_SUPPORTED_CPUID and irqchip (was Re: kvm_pv_unhalt and kernel_irqchip=off)
Date: Mon, 15 Aug 2016 19:50:59 +0200 [thread overview]
Message-ID: <20160815175059.GA12532@potion> (raw)
In-Reply-To: <20160815130346.GP5627@thinpad.lan.raisama.net>
2016-08-15 10:03-0300, Eduardo Habkost:
> On Sat, Aug 13, 2016 at 02:43:03PM +0200, Radim Krčmář wrote:
>> I would like to return -EINVAL from KVM_SET_CPUID2 if userspace
>> requested a new CPUID feature that cannot work in given situation.
>>
>> Another way would be to disable buggy features in KVM_SET_CPUID2, which
>> would require userspace to call KVM_GET_CPUID2 afterwards to learn what
>> the guest is actually using.
>
> I'd prefer to get -EINVAL. QEMU needs to be 100% aware of the
> resulting CPUID bits, so if we end up trying to enable something
> that will never work, it's already too late to silently clear
> CPUID bits.
>
> This shouldn't make any difference for existing QEMU code,
> because it already filters out all CPUID feature flags based
> on GET_SUPPORTED_CPUID (+ the extra fixups in
> QEMU's kvm_arch_get_supported_cpuid()).
>
> But: note that returning -EINVAL might break very old QEMU
> versions. Maybe we really want them to break (because they have
> broken CPUID logic), but I am not sure.
Returning -EINVAL would break even current QEMU users, because they
might have '-cpu host' with a guest that doesn't use PV_UNHALT.
-EINVAL without any interface changes could be done only for future
features. :/
>> I have patches that implement the latter for X2APIC and PV_UNHALT, but
>> I'm not sure if it's better than leaving the bug unfixed, because QEMU
>> doesn't use KVM_GET_CPUID2 and migration to older KVM would change
>> CPUID, which is a very subtle bug.
>> What do you think?
>
> Implementing those checks basically mean moving the existing
> kvm_arch_get_supported_cpuid() logic inside KVM. If we do that,
> can we get an interface that will allow QEMU to just query KVM
> instead of duplicating that logic?
The QEMU code for old kernels likely has to stay so only future features
wouldn't be duplicated.
An interface to check what KVM disabled already exists, KVM_GET_CPUID2.
We'd be hijacking it for "SET -> GET -> compare" where QEMU would exit()
if features got cleared (and/or added?).
It an extension of existing practice -- new features would be notified
with a capability and SET+GET would check if they really can be enabled.
We'd need a toggle (a way for QEMU to tell KVM that it will GET and
compare) to allow KVM safely clear already released features.
e.g. migration wouldn't enable this feature.
The toggle for SET+GET would be very simple in the kernel, but might
turn out to be complicated in QEMU ...
See the rough idea at the bottom.
> Making KVM_SET_CPUID2 clear those bits and/or return -EINVAL
> would probably be enough as an interface, as long as QEMU has a
> way to know in advance if it needs to clear CPUID bits based on
> the legacy kvm_arch_get_supported_cpuid() code (for older
> kernels), or if it can just give everything to KVM_SET_CPUID2,
> and check for unsupported/cleared flags later (for newer
> kernels).
A new IOCTL that returns valid-but-not-in-supported and
invalid-from-supported CPUIDs and forces KVM to -EINVAL if userspace
doesn't listen is in the game too. It is more complicated, though.
---8<---
Rough (doesn't even define everything) idea of the clearing approach.
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9ab16b2..8d085823c4af 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -545,6 +545,9 @@ struct kvm_cpuid {
struct kvm_cpuid_entry entries[0];
};
+If KVM_CAP_CLEAR_INVALID_CPUID is enabled, then the ioctls can modify CPUID
+bits to prevent invalid configurations.
+
4.21 KVM_SET_SIGNAL_MASK
@@ -3900,6 +3903,18 @@ to take care of that.
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.9 KVM_CAP_CLEAR_INVALID_CPUID
+
+Architectures: x86
+Parameters: none
+
+After setting this capability, KVM_SET_CPUID2 can modify CPUID bits that are
+invalid in the given configuration, e.g. disable KVM_FEATURE_PV_UNHALT when the
+LAPIC is in the userspace.
+
+The userspace is expected to use KVM_GET_CPUID2 to retrieve the final CPUID.
+
+
8. Other capabilities.
----------------------
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb596cd..afa05bb31cc3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -160,6 +160,21 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
}
}
+static void kvm_clear_invalid_cpuid(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_cpuid_entry2 *best;
+
+ if (!kvm->clear_invalid_cpuid)
+ return;
+
+ best = kvm_find_cpuid_entry(vcpu, 1, 0);
+ if (best && !lapic_in_kernel(vcpu))
+ best->ecx &= ~F(X2APIC);
+
+ /* TODO: to disable KVM_FEATURE_PV_UNHALT */
+}
+
int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -234,6 +249,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
vcpu->arch.cpuid_nent = cpuid->nent;
+
+ kvm_clear_invalid_cpuid(vcpu);
kvm_apic_set_version(vcpu);
kvm_x86_ops->cpuid_update(vcpu);
r = kvm_update_cpuid(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19f9f9e05c2a..99028df615e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3812,6 +3812,13 @@ split_irqchip_unlock:
r = 0;
break;
+ case KVM_CAP_CLEAR_INVALID_CPUID:
+ r = -EEXIST;
+ if (kvm->created_vcpus)
+ break;
+ kvm->clear_invalid_cpuid = true;
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
next prev parent reply other threads:[~2016-08-15 17:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-10 18:27 kvm_pv_unhalt and kernel_irqchip=off Eduardo Habkost
2016-08-10 19:04 ` Radim Krčmář
2016-08-10 20:58 ` Radim Krčmář
2016-08-12 18:37 ` GET_SUPPORTED_CPUID and irqchip (was Re: kvm_pv_unhalt and kernel_irqchip=off) Eduardo Habkost
2016-08-13 12:43 ` Radim Krčmář
2016-08-15 13:03 ` Eduardo Habkost
2016-08-15 13:21 ` Paolo Bonzini
2016-08-15 17:50 ` Radim Krčmář [this message]
2016-08-15 18:00 ` Eduardo Habkost
2016-08-15 18:32 ` Radim Krčmář
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160815175059.GA12532@potion \
--to=rkrcmar@redhat.com \
--cc=drjones@redhat.com \
--cc=ehabkost@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).