From: Igor Mammedov <imammedo@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>,
qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Evgeny Yakovlev <eyakovlev@virtuozzo.com>,
"Denis V . Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu_index
Date: Wed, 14 Jun 2017 15:24:50 +0200 [thread overview]
Message-ID: <20170614152450.199121ee@nial.brq.redhat.com> (raw)
In-Reply-To: <20170614130015.GC5016@thinpad.lan.raisama.net>
On Wed, 14 Jun 2017 10:00:15 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:
> On Wed, Jun 14, 2017 at 02:25:07PM +0300, Roman Kagan wrote:
> > On Tue, Jun 13, 2017 at 03:57:52PM -0300, Eduardo Habkost wrote:
> > > On Tue, Jun 06, 2017 at 09:19:30PM +0300, Roman Kagan wrote:
> > > > Hyper-V identifies vcpus by the virtual processor (VP) index which is
> > > > normally queried by the guest via HV_X64_MSR_VP_INDEX msr.
> > > >
> > > > It has to be owned by QEMU in order to preserve it across migration.
> > > >
> > > > However, the current implementation in KVM doesn't allow to set this
> > > > msr, and KVM uses its own notion of VP index. Fortunately, the way
> > > > vcpus are created in QEMU/KVM basically guarantees that the KVM value is
> > > > equal to QEMU cpu_index.
> > >
> > > This might not be true in the future. cpu_index is not a good
> > > identifier for CPUs, and we would like to remove it in the
> > > future.
> > >
> > > But it looks like we have no choice, see extra comments below:
> >
> > > > +static void hyperv_set_vp_index(CPUState *cs)
> > > > +{
> > > > + struct {
> > > > + struct kvm_msrs info;
> > > > + struct kvm_msr_entry entries[1];
> > > > + } msr_data;
> > > > + int ret;
> > > > +
> > > > + msr_data.info.nmsrs = 1;
> > > > + msr_data.entries[0].index = HV_X64_MSR_VP_INDEX;
> > > > +
> > > > + /*
> > > > + * Some kernels don't support setting HV_X64_MSR_VP_INDEX. However,
> > > > + * they use VP_INDEX equal to cpu_index due to the way vcpus are created.
> > > > + * So ignore the errors from SET_MSRS, but verify that GET_MSRS returns the
> > > > + * expected value.
> > > > + */
> > >
> > > Oh. This sounds like a problem. As reference for others, this
> > > is the KVM code in kvm_hv_get_msr():
> > >
> > > case HV_X64_MSR_VP_INDEX: {
> > > int r;
> > > struct kvm_vcpu *v;
> > >
> > > kvm_for_each_vcpu(r, v, vcpu->kvm) {
> > > if (v == vcpu) {
> > > data = r;
> > > break;
> > > }
> > > }
> > > break;
> > > }
> > >
> > > The problem with that is that it will break as soon as we create
> > > VCPUs in a different order. Unsolvable on hosts that don't allow
> > > HV_X64_MSR_VP_INDEX to be set, however.
> >
> > Right, thanks for putting together a detailed explanation.
> >
> > This was a thinko back then, not to have HV_X64_MSR_VP_INDEX maintained
> > by QEMU. I'm going to post a patch to KVM fixing that.
> >
> > Meanwhile QEMU needs a way to maintain its notion of vp_index that is
> > 1) in sync with kernel's notion
> > 2) also with kernels that don't support setting the msr
> > 3) persistent across migrations
> >
> > cpu_index looked like a perfect candidate.
>
> I would like to be able to stop using cpu_index as a persistent
> VCPU identifier in the future. I would also like to be able to
> create VCPUs in any order without breaking guest ABI. But it
> seems to be impossible to do that without breaking vp_index on
> older kernels.
>
> So cpu_index looks like the only candidate we have.
>
> >
> > > > + msr_data.entries[0].data = cs->cpu_index;
> > > > + kvm_vcpu_ioctl(cs, KVM_SET_MSRS, &msr_data);
> > > > + ret = kvm_vcpu_ioctl(cs, KVM_GET_MSRS, &msr_data);
> > > > + assert(ret == 1);
> > > > + assert(msr_data.entries[0].data == cs->cpu_index);
> > >
> > > If KVM already initializes the MSR to cpu_index and we will abort
> > > if it was not set to anything except cpu_index here, why exactly
> > > do we need the KVM_SET_MSRS call?
> >
> > The kernel made no obligations to keep initializing its vp_index
> > identically to QEMU's cpu_index. So we'd better check and abort if that
> > got out of sync. Once KVM gains the ability to learn vp_index from QEMU
> > we'll no longer depend on that as we'll do explicit synchronization.
>
> I think the kernel now has an obligation to keep initializing
> HV_X64_MSR_VP_INDEX in a compatible way, or it will break older
> QEMU versions that don't set the MSR.
>
> But if you don't think we can rely on that, then the KVM_SET_MSRS
> call won't hurt.
if we can tell apart old index based vp_index kernel and new that supports
MSR setting (add cap check perhaps) then we should be able to
- leave out old vp_index broken as it is now (for old kernels/old QEMU and/ new QEMU old machine types)
i.e. do not set vp_index MSR in QEMU
- in case of (new QEMU new machine type/new kernel) use APIC ID which would work
with out of order CPU creation
next prev parent reply other threads:[~2017-06-14 13:25 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-06 18:19 [Qemu-devel] [PATCH 00/23] hyperv fixes and enhancements Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 01/23] hyperv: add header with protocol definitions Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 02/23] update-linux-headers: prepare for hyperv.h removal Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 03/23] hyperv: set partition-wide MSRs only on first vcpu Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 04/23] hyperv: ensure msrs are inited properly Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu_index Roman Kagan
2017-06-13 18:57 ` Eduardo Habkost
2017-06-14 11:25 ` Roman Kagan
2017-06-14 11:26 ` Paolo Bonzini
2017-06-14 13:00 ` Igor Mammedov
2017-06-15 12:41 ` Roman Kagan
2017-06-15 13:22 ` Paolo Bonzini
2017-06-15 13:27 ` Igor Mammedov
2017-06-15 16:05 ` Roman Kagan
2017-06-18 15:29 ` Eduardo Habkost
2017-06-14 13:01 ` Eduardo Habkost
2017-06-14 13:11 ` Igor Mammedov
2017-06-14 13:17 ` Paolo Bonzini
2017-06-14 13:22 ` Eduardo Habkost
2017-06-14 13:37 ` Paolo Bonzini
2017-06-14 13:38 ` Igor Mammedov
2017-06-14 13:45 ` Eduardo Habkost
2017-06-14 18:40 ` Roman Kagan
2017-06-14 18:59 ` Eduardo Habkost
2017-06-15 8:26 ` Paolo Bonzini
2017-06-15 11:40 ` Roman Kagan
2017-06-15 11:42 ` Paolo Bonzini
2017-06-15 12:03 ` Roman Kagan
2017-06-14 13:19 ` Eduardo Habkost
2017-06-14 13:00 ` Eduardo Habkost
2017-06-14 13:24 ` Igor Mammedov [this message]
2017-06-14 13:35 ` Eduardo Habkost
2017-06-14 15:31 ` Igor Mammedov
2017-06-06 18:19 ` [Qemu-devel] [PATCH 06/23] hyperv: helper to find vcpu by VP index Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 07/23] hyperv_testdev: refactor for readability Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 08/23] hyperv: cosmetic: g_malloc -> g_new Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 09/23] hyperv: synic: only setup ack notifier if there's a callback Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 10/23] hyperv: allow passing arbitrary data to sint ack callback Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 12/23] hyperv: make HvSintRoute reference-counted Roman Kagan
2017-06-14 13:53 ` Eduardo Habkost
2017-06-14 16:23 ` Roman Kagan
2017-06-23 12:44 ` Eduardo Habkost
2017-06-06 18:19 ` [Qemu-devel] [PATCH 13/23] hyperv: qdev-ify SynIC Roman Kagan
2017-06-13 18:34 ` Eduardo Habkost
2017-06-14 9:58 ` Roman Kagan
2017-06-14 12:46 ` Eduardo Habkost
2017-06-14 15:11 ` Roman Kagan
2017-06-14 15:21 ` Eduardo Habkost
2017-06-06 18:19 ` [Qemu-devel] [PATCH 14/23] kvm-all: make async_safe_run_on_cpu safe on kvm too Roman Kagan
2017-06-08 14:47 ` Paolo Bonzini
2017-06-06 18:19 ` [Qemu-devel] [PATCH 15/23] hyperv: make overlay pages for SynIC Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 16/23] hyperv: map overlay pages after updating msrs Roman Kagan
2017-06-14 11:12 ` Paolo Bonzini
2017-06-14 11:54 ` Roman Kagan
2017-06-14 12:11 ` Paolo Bonzini
2017-06-14 12:41 ` Roman Kagan
2017-06-14 12:46 ` Paolo Bonzini
2017-06-06 18:19 ` [Qemu-devel] [PATCH 17/23] hyperv: add synic message delivery Roman Kagan
2017-06-14 15:08 ` Paolo Bonzini
2017-06-14 15:28 ` Roman Kagan
2017-06-14 15:32 ` Paolo Bonzini
2017-06-14 15:39 ` Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 18/23] hyperv: add synic event flag signaling Roman Kagan
2017-06-14 15:07 ` Paolo Bonzini
2017-06-06 18:19 ` [Qemu-devel] [PATCH 19/23] hyperv: process SIGNAL_EVENT hypercall Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 20/23] hyperv: process POST_MESSAGE hypercall Roman Kagan
2017-06-14 11:19 ` Paolo Bonzini
2017-06-14 14:20 ` Roman Kagan
2017-06-14 14:30 ` Paolo Bonzini
2017-06-06 18:19 ` [Qemu-devel] [PATCH 21/23] hyperv_testdev: add SynIC message and event testmodes Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 22/23] MAINTAINERS: add myself and eyakovlev@ for hyperv* Roman Kagan
2017-06-06 18:19 ` [Qemu-devel] [PATCH 23/23] hyperv: update copyright notices Roman Kagan
[not found] ` <20170606181948.16238-12-rkagan@virtuozzo.com>
2017-06-13 19:02 ` [Qemu-devel] [PATCH 11/23] hyperv: address HvSintRoute by X86CPU pointer Eduardo Habkost
2017-06-14 11:08 ` Paolo Bonzini
2017-06-14 12:14 ` Roman Kagan
2017-06-14 12:17 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170614152450.199121ee@nial.brq.redhat.com \
--to=imammedo@redhat.com \
--cc=den@openvz.org \
--cc=ehabkost@redhat.com \
--cc=eyakovlev@virtuozzo.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rkagan@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).