From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Eduardo Habkost <ehabkost@redhat.com>,
kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] invtsc + migration + TSC scaling
Date: Tue, 18 Oct 2016 10:04:57 +0100 [thread overview]
Message-ID: <20161018090457.GC2190@work-vm> (raw)
In-Reply-To: <20161017145008.GA2307@potion>
* Radim Krčmář (rkrcmar@redhat.com) wrote:
> 2016-10-17 07:47-0200, Marcelo Tosatti:
> > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote:
> >> I have been wondering: should we allow live migration with the
> >> invtsc flag enabled, if TSC scaling is available on the
> >> destination?
> >
> > TSC scaling and invtsc flag, yes.
>
> Yes, if we have well synchronized time between hosts, then we might be
> able to migrate with a TSC shift that cannot be perceived by the guest.
>
> Unless the VM also has a migratable assigned PCI device that uses ART,
> because we have no protocol to update the setting of ART (in CPUID), so
> we should keep migration forbidden then.
>
> >> For reference, this is what the Intel SDM says about invtsc:
> >>
> >> The time stamp counter in newer processors may support an
> >> enhancement, referred to as invariant TSC. Processor’s support
> >> for invariant TSC is indicated by CPUID.80000007H:EDX[8].
> >>
> >> The invariant TSC will run at a constant rate in all ACPI P-,
> >> C-. and T-states. This is the architectural behavior moving
> >> forward. On processors with invariant TSC support, the OS may
> >> use the TSC for wall clock timer services (instead of ACPI or
> >> HPET timers). TSC reads are much more efficient and do not
> >> incur the overhead associated with a ring transition or access
> >> to a platform resource.
> >
> > Yes. The blockage happened for different reasons:
> >
> > 1) Migration: to host with different TSC frequency.
>
> We shouldn't have done this even now when emulating anything newer than
> Pentium 4, because those CPUs have constant TSC, which only lacks the
> guarantee that it doesn't stop in deep C-states:
>
> For [a list of processors we emulate]: the time-stamp counter
> increments at a constant rate. That rate may be set by the maximum
> core-clock to bus-clock ratio of the processor or may be set by the
> maximum resolved frequency at which the processor is booted. The
> maximum resolved frequency may differ from the processor base
> frequency, see Section 18.18.2 for more detail. On certain processors,
> the TSC frequency may not be the same as the frequency in the brand
> string.
>
> The specific processor configuration determines the behavior. Constant
> TSC behavior ensures that the duration of each clock tick is uniform
> and supports the use of the TSC as a wall clock timer even if the
> processor core changes frequency. This is the architectural behavior
> moving forward.
>
> Invariant TSC is more useful, though, so more applications would break
> when migrating to a different TSC frequency.
>
> > 2) Savevm: It is not safe to use the TSC for wall clock timer
> > services.
>
> With constant TSC, we could argue that a shift to deep C-state happened
> and paused TSC, which is not a good behavior, but somewhat defensible.
>
> > By allowing savevm, you make a commitment to allow a feature
> > at the expense of not complying with the spec (specifically the "
> > the OS may use the TSC for wall clock timer services", because the
> > TSC stops relative to realtime for the duration of the savevm stop
> > window).
>
> Yep, we should at least guesstimate the TSC to allow the guest to resume
> with as small TSC-shift as possible and check that hosts were somewhat
> synchronized with UTC (or something we choose for time).
>
> > But since Linux guests use kvmclock and Windows guests use Hyper-V
> > enlightenment, it should be fine to disable 2).
> >
> > There is a bug open for this, btw:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1353073
>
> These people should be happy with just live-migrations, so can't we just
> keep savevm forbidden?
Why is savevm so much harder? Is it just the difference in real time?
If so then I do worry about how small a difference you're hoping
to guarentee in live-migration.
Dave
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
prev parent reply other threads:[~2016-10-18 9:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-14 21:20 [Qemu-devel] invtsc + migration + TSC scaling Eduardo Habkost
2016-10-17 9:47 ` Marcelo Tosatti
2016-10-17 14:50 ` Radim Krčmář
2016-10-17 16:24 ` Paolo Bonzini
2016-10-17 21:11 ` Eduardo Habkost
2016-10-17 23:58 ` Marcelo Tosatti
2016-10-18 13:41 ` Paolo Bonzini
2016-10-18 17:09 ` Marcelo Tosatti
2016-10-18 20:52 ` Radim Krčmář
2016-10-18 21:05 ` Eduardo Habkost
2016-10-19 13:27 ` Radim Krčmář
2016-10-19 13:55 ` Eduardo Habkost
2016-10-19 15:42 ` Radim Krčmář
2016-10-19 17:42 ` Eduardo Habkost
2016-10-18 13:48 ` Radim Krčmář
2016-10-18 13:36 ` Radim Krčmář
2016-10-18 13:38 ` Radim Krčmář
2016-10-17 17:20 ` Marcelo Tosatti
2016-10-18 13:27 ` Radim Krčmář
2016-10-18 9:04 ` Dr. David Alan Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161018090457.GC2190@work-vm \
--to=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).