From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34213) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ezSfz-0001U2-H9 for qemu-devel@nongnu.org; Fri, 23 Mar 2018 15:48:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ezSfu-0000Xf-BL for qemu-devel@nongnu.org; Fri, 23 Mar 2018 15:48:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39948) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ezSfu-0000XM-1W for qemu-devel@nongnu.org; Fri, 23 Mar 2018 15:48:30 -0400 Date: Fri, 23 Mar 2018 16:48:27 -0300 From: Eduardo Habkost Message-ID: <20180323194827.GM3417@localhost.localdomain> References: <20180320173500.32065-3-vkuznets@redhat.com> <20180321121001.GD14983@rkaganb.sw.ru> <87bmfh7eip.fsf@vitty.brq.redhat.com> <20180321165729.GH14983@rkaganb.sw.ru> <20180321201924.GE3417@localhost.localdomain> <20180322130013.GB4500@rkaganb.sw.ru> <20180322132218.GH3417@localhost.localdomain> <20180322135803.GC4500@rkaganb.sw.ru> <20180322183813.GA28161@localhost.localdomain> <20180323094529.GA28085@rkaganb.sw.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180323094529.GA28085@rkaganb.sw.ru> Subject: Re: [Qemu-devel] [PATCH v3 2/2] i386/kvm: lower requirements for Hyper-V frequency MSRs exposure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Roman Kagan , Vitaly Kuznetsov , Paolo Bonzini , Richard Henderson , Marcelo Tosatti , qemu-devel@nongnu.org On Fri, Mar 23, 2018 at 12:45:30PM +0300, Roman Kagan wrote: > On Thu, Mar 22, 2018 at 03:38:13PM -0300, Eduardo Habkost wrote: > > On Thu, Mar 22, 2018 at 04:58:03PM +0300, Roman Kagan wrote: > > > On Thu, Mar 22, 2018 at 10:22:18AM -0300, Eduardo Habkost wrote: > > > > On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote: > > > > > On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote: > > > > > > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote: > > > > > > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote: > > > > > > > > Roman Kagan writes: > > > > > > > > > > > > > > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote: > > > > > > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS > > > > > > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when > > > > > > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean > > > > > > > > >> these frequencies are stable, it just means they're available for reading. > > > > > > > > >> > > > > > > > > >> Signed-off-by: Vitaly Kuznetsov > > > > > > > > >> --- > > > > > > > > >> target/i386/kvm.c | 2 +- > > > > > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > >> > > > > > > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c > > > > > > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644 > > > > > > > > >> --- a/target/i386/kvm.c > > > > > > > > >> +++ b/target/i386/kvm.c > > > > > > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs) > > > > > > > > >> env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE; > > > > > > > > >> env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE; > > > > > > > > >> > > > > > > > > >> - if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) { > > > > > > > > >> + if (has_msr_hv_frequencies && env->tsc_khz) { > > > > > > > > >> env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS; > > > > > > > > >> env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE; > > > > > > > > >> } > > > > > > > > > > > > > > > > > > I suggest that we add a corresponding cpu property here, too. The guest > > > > > > > > > may legitimately rely on these msrs when it sees the support in CPUID, > > > > > > > > > and migrating from a kernel with the feature supported (4.14+) to an > > > > > > > > > older one will make it crash. > > > > > > > > > > > > > > > > > > > > > > > > > This can be arranged, but what happens to people who use these features > > > > > > > > today? Assuming they also passed 'invtsc' they have stable TSC page > > > > > > > > clocksource already (when Hyper-V role is enabled) but when we start > > > > > > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what > > > > > > > > they have... > > > > > > > > > > > > > > I see two cases here: > > > > > > > > > > > > > > 1) people start a new VM, and discover that their old configuration is > > > > > > > not enough for this feature to work. > > > > > > > > > > > > > > They need to reconfigure and restart the VM. This costs them some > > > > > > > time investigating and restarting, but not data. > > > > > > > > > > > > If we keep machine-type compatibility, people will need to do > > > > > > that only if they change the machine-type (or use the "pc" or > > > > > > "q35" aliases). If they copy the old configuration, it will keep > > > > > > working. > > > > > > > > > > The problem is that the feature is not fixed by the machine-type, due to > > > > > the forgotten property: it only depends on the KVM version. So, once > > > > > (if) we add the property and make the feature deterministic, we'll lose > > > > > compatibility one way or another. > > > > > > > > > > Or are you suggesting that for pre-2.12 machine types we leave the > > > > > property at "decided by your KVM" state? > > > > > > > > Yes, that's what I mean. This looks like the only way to avoid > > > > losing features by just cold-rebooting an existing VM. > > > > > > > > The scenario I'm thinking is this: > > > > > > > > 1) pc-2.11 VM started on host running QEMU 2.11 > > > > 2) VM migrated to a host containing this patch > > > > 3) 1 year later, the VM is shut down and booted again. > > > > 4) Things stop working inside the VM because hv-frequency is > > > > unexpectedly gone. > > > > > > > > Machine-type compatibility code would avoid (4). > > > > > > Right, but (4) typically means that you fail to start your workload from > > > a clean state, so you just go and fix it; no data is lost. > > > > > > Compare this to a migration to an older KVM which results in your guest > > > crashing, where you risk data loss and still have to meddle with > > > configs. > > > > True. To make it worse, we are already unable to avoid this crash > > on existing VMs without a reboot. The only case where we can fix > > this is if live-migration to older KVM happens after the guest > > was rebooted when running on a newer QEMU version. :( > > Hmm, I thought the scheme I outlined below covered (== blocked) live > migration QEMU-2.11/KVM-4.14+ -> QEMU-2.12(machine-2.11)/KVM-4.13-, > didn't it? It should, but what about migration QEMU-2.12(pc-2.11)/KVM-4.14 -> QEMU-2.11(pc-2.11)/KVM-4.13? Or, more specifically: QEMU-2.11(pc-2.11)/KVM-4.14 -> QEMU-2.12(pc-2.11)/KVM-4.14 -> QEMU-2.11(machine-2.11)/KVM-4.13? > > > > > > > machine-type compatibility also makes the following case a bit > > > > > > safer: > > > > > > > > > > > > > > > > > > > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with > > > > > > > ->hv_frequency=off (assuming on both ends KVM supports the frequency > > > > > > > MSRs). > > > > > > > > > > > > > > With the current implementation in KVM, this will only result in the > > > > > > > feature bits disappearing from the respective CPUID leaf, but the > > > > > > > MSRs themselves will continue working as they used to. So the guest > > > > > > > either won't notice or will check the CPUID and adjust. > > > > > > > > > > > > If we keep machine-type compatibility, the CPUID bit won't > > > > > > disappear for the guest while the MSRs keep working. > > > > > > > > > > > > > > > > > > Whichever solution we choose, we can still have guests crashing > > > > > > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host > > > > > > with an older kernel. But I don't think there's a way out of > > > > > > this, except requiring an explicit "hv-frequencies" CPU option on > > > > > > newer machine-types. > > > > > > > > > > What's wrong with requiring it, as we do for all other hv_* properties? > > > > > > > > On new machine-types, nothing wrong. > > > > > > > > On existing machine-types, see above. > > > > > > I wonder if the following can cater to all relevant cases: > > > > > > - hv_frequencies property is added, defaulting to "off", so that new > > > users of this feature would need to explicitly turn it on; > > > > > > - on pre-2.12 machine types, it's set to the value of hv_time property > > > by the compat code, so that on VMs where this feature could > > > potentially be present it would become required; as a result, these > > > configurations will refuse to start on insufficiently capable KVM, > > > preventing the migration attempts. > > > > This sounds like the safest option. The cost will be the > > inconvenience of being unable to run pc-2.11 on hosts with older > > KVM (Linux < v4.14, without commit > > 72c139bacfa386145d7bbb68c47c8824716153b6), > > not completely unable: people will have to add "hv_frequencies=off" to > their cpu spec This is different from the patch you sent, which sets hv-frequencies=off by default on pc-2.11 too. (And now I see you described this approach in the last paragraph below. :) > > > and the need to explicitly enable hv-frequencies on pc-2.12 and newer. > > which is the standard situation for all new features. Yes, no question on what we want to do on pc-2.12. > > > > Am I missing any scenarios that aren't covered? > > > > > > > It looks like the guest can still crash if we migrate > > "QEMU-2.12 -machine pc-2.11 -cpu ...,+hv-time" to a host running > > QEMU 2.11 and Linux < 4.14. > > Indeed :( Well, your patch fixes it by not enabling hv-frequencies by default on any machine-type. Do you see any gotchas? > > > I wonder if there's a way to avoid that? If there's a way to avoid > > that with extra migration subsections, > > I guess this should work. > > > is it worth the effort/complexity? > > This is a judgement call. For vendors this is a non-issue because most > of them haven't even started shipping 2.11, so they just don't have VMs > with this problem in the field. > > So, taking the effort/complexity vs safety tradeoff into account, we can > consider an alternative approach: just add hv_frequencies (default=off) > cpu property to 2.12 and 2.11-stable, and ignore the cases where it's > run on QEMU versions without explicit control over this feature. Would > it be too much against the current policy? > With this, we can just declare that QEMU v2.11.0 + Linux 4.14+ was broken, and advise people to upgrade QEMU. I think is the most reasonable option we have. See my reply to the patch you sent. -- Eduardo