From: Marcelo Tosatti <mtosatti@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Haozhong Zhang <haozhong.zhang@intel.com>,
libvir-list@redhat.com
Subject: Re: [Qemu-devel] TSC frequency configuration & invtsc migration (was Re: [PATCH 4/4] kvm: Allow migration with invtsc)
Date: Fri, 6 Jan 2017 08:31:27 -0200 [thread overview]
Message-ID: <20170106103126.GA1575@amt.cnet> (raw)
In-Reply-To: <20170105121950.GP3315@thinpad.lan.raisama.net>
On Thu, Jan 05, 2017 at 10:19:50AM -0200, Eduardo Habkost wrote:
> On Thu, Jan 05, 2017 at 08:48:32AM -0200, Marcelo Tosatti wrote:
> > On Wed, Jan 04, 2017 at 11:36:31PM -0200, Eduardo Habkost wrote:
> > > On Wed, Jan 04, 2017 at 08:26:27PM -0200, Marcelo Tosatti wrote:
> > > > On Wed, Jan 04, 2017 at 05:59:17PM -0200, Eduardo Habkost wrote:
> > > > > On Wed, Jan 04, 2017 at 11:39:16AM -0200, Eduardo Habkost wrote:
> > > > > > On Wed, Jan 04, 2017 at 09:56:56AM -0200, Marcelo Tosatti wrote:
> > > > > > > On Tue, Dec 27, 2016 at 05:21:20PM -0200, Eduardo Habkost wrote:
> > > > > > > > Instead of blocking migration on the source when invtsc is
> > > > > > > > enabled, rely on the migration destination to ensure there's no
> > > > > > > > TSC frequency mismatch.
> > > > > > > >
> > > > > > > > We can't allow migration unconditionally because we don't know if
> > > > > > > > the destination is a QEMU version that is really going to ensure
> > > > > > > > there's no TSC frequency mismatch. To ensure we are migrating to
> > > > > > > > a destination that won't ignore SET_TSC_KHZ errors, allow invtsc
> > > > > > > > migration only on pc-*-2.9 and newer.
> > > > > > > >
> > > > > > > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> > > > > [...]
> > > > > > > > @@ -2655,12 +2656,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
> > > > > > > > }
> > > > > > > >
> > > > > > > > if (level == KVM_PUT_FULL_STATE) {
> > > > > > > > - /* We don't check for kvm_arch_set_tsc_khz() errors here,
> > > > > > > > - * because TSC frequency mismatch shouldn't abort migration,
> > > > > > > > - * unless the user explicitly asked for a more strict TSC
> > > > > > > > - * setting (e.g. using an explicit "tsc-freq" option).
> > > > > > > > + /* Migration TSC frequency mismatch is fatal only if we are
> > > > > > > > + * actually reporting Invariant TSC to the guest.
> > > > > > > > */
> > > > > > > > - kvm_arch_set_tsc_khz(cpu);
> > > > > > > > + ret = kvm_arch_set_tsc_khz(cpu);
> > > > > > > > + if ((x86_cpu->env.features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC) &&
> > > > > > > > + ret < 0) {
> > > > > > > > + return ret;
> > > > > > > > + }
> > > > > > > > }
> > > > > > >
> > > > > > > Will the guest continue in the source in this case?
> > > > > > >
> > > > > > > I think this is past the point where migration has been declared
> > > > > > > successful.
> > > > > > >
> > > > > > > Otherwise looks good.
> > > > > >
> > > > > > Good point. I will make additional tests and see if there's some
> > > > > > other place where the kvm_arch_set_tsc_khz() call can be moved
> > > > > > to.
> > > > >
> > > > > So, if we solve this and do something on (for example) post_load,
> > > > > we still have a problem: device state is migrated after RAM. This
> > > > > means QEMU will check for TSC scaling and abort migration very
> > > > > late.
> > > > >
> > > > > We could solve that by manually registering a SaveVMHandler that
> > > > > will send the TSC frequency on save_live_setup, so migration is
> > > > > aborted earlier.
> > > > >
> > > > > But: this sounds like just a complex hack to work around the real
> > > > > problems:
> > > > >
> > > > > 1) TSC frequency is guest-visible, and anything that affects
> > > > > guest ABI should depend on the VM configuration, not on host
> > > > > capabilities;
> > > >
> > > > Well not really: the TSC frequency where the guest starts
> > > > is the frequency the guest software expects.
> > > > So it does depend on host capabilities.
> > >
> > > Could you explain where this expectation comes from? I can see
> > > multiple cases where choosing the TSC frequency where the VM
> > > starts is not the best option.
> >
> > 1. Boot guest.
> > 2. Calibrate TSC.
> > 3. Use delay() with TSC calibration above, or
> > use TSC to measure the passage of time (TSC clock
> > in userspace).
>
> If TSC scaling is available, using a different frequency should
> be safe, shouldn't it? Otherwise, migrating with TSC scaling
> wouldn't be safe either.
Yes, but if you don't have TSC scaling, you have to boot with the
host frequency.
> Anyway: I don't disagree the starting host frequency is a good
> default. It is. But I don't think it's the best option on all
> cases.
Agree.
> >
> > > I am considering two possible scenarios below. You probably have
> > > another scenario in mind, so it would be useful if you could
> > > describe it so we can consider possible solutions.
> > >
> > >
> > > Scenario 1:
> > >
> > > You have two hosts: A and B, both with TSC scaling. They have
> > > different frequencies. The VM can be started on either one of
> > > them, and can be migrated to either one depending on the policy
> > > of management software.
> > >
> > > Maybe B is a backup host, the VM is expected to run most times on
> > > host A, and we want to use the TSC frequency from host A every
> > > time. Maybe it's the opposite and we want to use the frequency of
> > > B. Maybe we want to use the lowest frequency of both, maybe the
> > > highest one. We (QEMU developers) can recommend policy to libvirt
> > > developers, but a given QEMU process doesn't have information to
> > > decide what's best.
> >
> > I can't see any practical scenario here, you will always want
> > to start with TSC frequency of the host where the VM was started.
> >
> > If i am mistaken, please describe a practical case.
> >
> > (If a practical scenario comes up, and there is a use-case
> > for setting the TSC frequency on startup, lets say
> > a Windows VM which fails to boot if the TSC frequency
> > is too high, then it should be supported... But the
> > only known scenario to me, applying to 99.999% of cases,
> > is to expose the TSC frequency where the guest booted at).
>
> I don't have any specific case: my point is that I can't tell
> what's the best frequency if I don't know where the hosts are
> expected to be migrated to.
>
> You claim that using the starting host frequency is the best
> option on the vast majority of cases. Maybe it's true, and it
> would be a good default. The only problem is that this default
> affects migratability:
>
> >
> > > Scenario 2:
> > >
> > > Host A has TSC scaling, host B doesn't have TSC scaling. We want
> > > to be able to start the VM on host A, and migrate to B. In this
> > > case, the only possible solution is to use B's frequency when
> > > starting the VM. The QEMU process doesn't have enough information
> > > to make that decision.
> >
> > That is a good point. But again, its a special case and
> > should be supported by -cpu xxx,tsc-frequency=zzzz.
> >
> > However, for the vast majority of 99.999% cases, the issue
> > can be handled entirely in QEMU, without libvirt involvement,
> > and without adding extra steps to the management software.
>
> I agree it should cover most cases. The only problem here is that
> it can break migration in unexpected ways.
>
> Then my point is: assuming that libvirt will prefer to require
> explicit TSC frequency configuration to enable invtsc migration
> (instead of getting unpredictable migration compatibility), is
> the added complexity to migration code worth the effort, if
> choosing an explicit frequency is safer and more predictable? I
> believe this is where we disagree.
>
> >
> > > > > 2) Setting TSC frequency depending on the host will make
> > > > > migratability unpredictable for management software: the same
> > > > > VM config could be migratable to host A when started on host
> > > > > B, but not migratable to host A when started on host C.
> > > >
> > > > Well, just check the frequency.
> > >
> > > What do you mean by "just check the frequency"? What exactly
> > > should management software do?
> >
> > VM booted on host-A can be migrated to host-B if TSC
> > frequency matches.
>
> Except when TSC scaling is available. Then you may or may not
> migrate from A to B, and you don't know that unless you actually
> try to migrate.
>
> >
> > > Whatever management software do, if you just use the source host
> > > frequency you can get into a situation where you can start a VM
> > > on host A and migrate it to B, but can't start the VM on host B
> > > and migrate it to A. This would be confusing for users, and
> > > probably break assumptions of existing management software.
> >
> > Well this is a side effect of invariant TSC and migration.
> >
> > Look, i agree with all your points, but my point is this: i personally
> > prefer to handle the 99.999% case, which is the TSC frequency exposed is the one
> > from the host where the guest booted, in QEMU entirely (for example, to
> > make life easier for people who run qemu manually such as myself).
>
> Right. I don't mind not covering 100% of cases. But I worry if
> the cases not covered by us behave unexpectedly and
> unpredictably. If we caused a failure every time an
> unsafe/unsupported config was used, it would be OK. But making
> the ability to migrate unpredictable is a more serious issue IMO.
>
> I believe we both agree about how the final version should
> behave, but disagree about what is more important in the first
> version:
>
> * My proposal for the first version means:
> * People would have to configure TSC frequency manually if
> they want invtsc + migration (until we also provide a
> mechanism to query the host TSC frequency so
> management/scripts could choose it as default);
> * Adding more code to libvirt.
> * Your proposal for the first version means:
> * The ability to migrate won't be predictable by libvirt;
> * Extra complexity on the migration code to ensure
> we abort migration on mismatch.
>
> We seem to be weighting those issues differently. To me, having
> predictability on migration ability since the first version is
> more important to me than making configuration easier (on the
> first version).
>
> >
> > > > > I suggest we allow migration with invtsc if and only if
> > > > > tsc-frequency is set explicitly by management software. In other
> > > > > words, apply only patches 1/4 and 2/4 from this series. After
> > > > > that, we will need libvirt support for configuring tsc-frequency.
> > > >
> > > > I don't like that for the following reasons:
> > > >
> > > > * It moves low level complexity from QEMU/KVM to libvirt
> > > > (libvirt has to call KVM_GET_TSC_KHZ from a vcpu thread).
> > >
> > > It doesn't. It could ask QEMU for that information (the new
> > > query-cpu-model-expansion QMP command would allow that).
> > >
> > > Or, alternatively, it could just let the user choose the
> > > frequency.
> >
> > Again, you want to expose the host where the VM booted in most
> > cases (except the ones you list above).
> >
> > > It's probably acceptable for many use cases where
> > > invtsc+migration is important.
> >
> > Ok, thats better. Can you add a patch to your series with the steps
> > as how mgmt software should proceed ?
> >
> > > > * It makes it difficult to run QEMU manually (i use QEMU
> > > > manually all the time).
> > >
> > > I believe the set of people that: 1) need invtsc; 2) need
> > > migration to work; and 3) run QEMU manually will be able to add
> > > "tsc-frequency=XXX" to the command-line easily. :)
> >
> > Ok, so migration is only allowed if tsc-frequency= is specified.
> >
> > > > * It requires patching libvirt.
> > > >
> > > > In my experience things work better when the functionality is
> > > > not split across libvirt/qemu.
> > >
> > > In my experience things break when management software is the
> > > only component able to make a decision but we don't provide
> > > mechanisms to let management software make that decision.
> > >
> > > The TSC frequency affects migratability to hosts. Choose the
> > > wrong frequency, and you might not be able to migrate. Only
> > > management software knows to which hosts the VM could be migrated
> > > in the future, and which TSC frequency would allow that.
> >
> > True.
> >
> > > > Can't this be fixed in QEMU? Just check that destination host supports
> > > > TSC scaling before migration (or that KVM_GET_TSC_KHZ return value
> > > > matches on source and destination).
> > > >
> > >
> > > This solves only one use case: where you want to expose the
> > > starting host TSC frequency to the guest. It doesn't cover any
> > > scenario where the TSC frequency of the starting host isn't the
> > > best one. (See the two scenarios I described above)
> >
> > True.
> >
> > > You seem to have a specific use case in mind. If you describe it,
> > > we could decide if it's worth the extra migration code complexity
> > > to deal with invtsc migration without explicit tsc-frequency. By
> > > now, I am not convinced yet.
> >
> > I don't have any specific use case in mind. I'm just trying to
> > avoid moving complexity to libvirt which in my experience is a
> > the best thing to do.
>
> I think both our proposals make things harder for libvirt in
> different ways. I just want to make the complexity explicit for
> libvirt, instead of giving them the illusion of safety (making
> migration break in ways libvirt can't detect).
>
> Anyway, your points are still valid and it would be still useful
> to do what you propose. I will give it a try on a new version of
> patch 4/4 that can abort migration earlier. But note that I
> expect some pushback from other maintainers because of the
> complexity of the code. If that happens, be aware that I won't be
> able to justify the added complexity because I would still think
> it's not worth it.
>
> >
> > > Maybe your use case just needs a explicit "invtsc-migration=on"
> > > command-line flag without a mechanism to abort migration on
> > > mismatch? I can't tell.
> >
> > Again, there is no special case.
> >
> > > Note that even if we follow your suggestion and implement an
> > > alternative version of patch 4/4 to cover your use case, I will
> > > strongly recommend libvirt developers to support configuring TSC
> > > frequency if they want to support invtsc + migration without
> > > surprising/unpredictable restrictions on migratability.
> >
> > Well, alright. If you make the TSC frequency of the host
> > available to mgmt software as you describe, and write the steps mgmt
> > software should take, i'm good.
>
> I plan to. The problem is that the mechanism to query the host
> frequency may be unavailable in the first version.
Well just export KVM_GET_TSC_KHZ in a QMP command right? Its pretty
easy.
Let me know if you need any help coding or testing.
next prev parent reply other threads:[~2017-01-06 10:32 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-27 19:21 [Qemu-devel] [PATCH 0/4] Allow migration with invtsc if there's no frequency mismatch Eduardo Habkost
2016-12-27 19:21 ` [Qemu-devel] [PATCH 1/4] kvm: Simplify invtsc check Eduardo Habkost
2016-12-29 18:57 ` Marcelo Tosatti
2016-12-27 19:21 ` [Qemu-devel] [PATCH 2/4] kvm: Allow invtsc migration if tsc-khz is set explicitly Eduardo Habkost
2017-01-04 11:44 ` Marcelo Tosatti
2017-01-04 11:57 ` Marcelo Tosatti
2017-01-04 13:40 ` Eduardo Habkost
2017-01-04 13:45 ` Marcelo Tosatti
2017-01-04 13:51 ` Eduardo Habkost
2016-12-27 19:21 ` [Qemu-devel] [PATCH 3/4] pc: Add 2.9 machine-types Eduardo Habkost
2016-12-30 13:38 ` Igor Mammedov
2017-01-04 14:01 ` Laszlo Ersek
2017-01-04 14:20 ` Eduardo Habkost
2017-01-04 16:40 ` Laszlo Ersek
2017-01-04 17:46 ` Eduardo Habkost
2016-12-27 19:21 ` [Qemu-devel] [PATCH 4/4] kvm: Allow migration with invtsc Eduardo Habkost
2017-01-04 11:56 ` Marcelo Tosatti
2017-01-04 13:39 ` Eduardo Habkost
2017-01-04 19:59 ` [Qemu-devel] TSC frequency configuration & invtsc migration (was Re: [PATCH 4/4] kvm: Allow migration with invtsc) Eduardo Habkost
2017-01-04 22:26 ` Marcelo Tosatti
2017-01-05 1:36 ` Eduardo Habkost
2017-01-05 10:48 ` Marcelo Tosatti
2017-01-05 10:50 ` Marcelo Tosatti
2017-01-05 12:19 ` Eduardo Habkost
2017-01-05 12:33 ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2017-01-05 12:48 ` Eduardo Habkost
2017-01-05 13:00 ` Daniel P. Berrange
2017-01-05 13:11 ` Eduardo Habkost
2017-01-10 16:38 ` Paolo Bonzini
2017-01-06 10:31 ` Marcelo Tosatti [this message]
2017-01-08 15:49 ` [Qemu-devel] How to make dest host abort migration safely " Eduardo Habkost
2017-01-09 10:04 ` Dr. David Alan Gilbert
2017-01-08 20:28 ` [Qemu-devel] Exporting kvm_max_guest_tsc_khz to userspace " Eduardo Habkost
2017-01-09 14:58 ` Paolo Bonzini
2017-01-11 13:26 ` Eduardo Habkost
2017-01-11 14:06 ` Paolo Bonzini
2017-01-10 16:36 ` [Qemu-devel] TSC frequency configuration & invtsc migration " Paolo Bonzini
2017-01-11 11:58 ` Eduardo Habkost
2017-01-18 11:55 ` Marcelo Tosatti
2017-01-18 12:43 ` Eduardo Habkost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170106103126.GA1575@amt.cnet \
--to=mtosatti@redhat.com \
--cc=ehabkost@redhat.com \
--cc=haozhong.zhang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=libvir-list@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).