All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: invtsc + migration + TSC scaling
Date: Mon, 17 Oct 2016 15:20:06 -0200	[thread overview]
Message-ID: <20161017172005.GA24607@amt.cnet> (raw)
In-Reply-To: <20161017145008.GA2307@potion>

On Mon, Oct 17, 2016 at 04:50:09PM +0200, Radim Krčmář wrote:
> 2016-10-17 07:47-0200, Marcelo Tosatti:
> > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote:
> >> I have been wondering: should we allow live migration with the
> >> invtsc flag enabled, if TSC scaling is available on the
> >> destination?
> > 
> > TSC scaling and invtsc flag, yes.
> 
> Yes, if we have well synchronized time between hosts, then we might be
> able to migrate with a TSC shift that cannot be perceived by the guest.

Even if the guest can't detect the TSC difference (relative to realtime),
i suppose TSC should be advanced to account for the migration stopped 
time (so that TSC appears to have incremented at a "constant rate").

> Unless the VM also has a migratable assigned PCI device that uses ART,
> because we have no protocol to update the setting of ART (in CPUID), so
> we should keep migration forbidden then.

What is the use case for ART again? (need to catchup on that).

> 
> >> For reference, this is what the Intel SDM says about invtsc:
> >> 
> >>   The time stamp counter in newer processors may support an
> >>   enhancement, referred to as invariant TSC. Processor’s support
> >>   for invariant TSC is indicated by CPUID.80000007H:EDX[8].
> >> 
> >>   The invariant TSC will run at a constant rate in all ACPI P-,
> >>   C-. and T-states. This is the architectural behavior moving
> >>   forward. On processors with invariant TSC support, the OS may
> >>   use the TSC for wall clock timer services (instead of ACPI or
> >>   HPET timers). TSC reads are much more efficient and do not
> >>   incur the overhead associated with a ring transition or access
> >>   to a platform resource.
> >
> > Yes. The blockage happened for different reasons:
> > 
> > 1) Migration: to host with different TSC frequency.
> 
> We shouldn't have done this even now when emulating anything newer than
> Pentium 4, because those CPUs have constant TSC, which only lacks the
> guarantee that it doesn't stop in deep C-states:
> 
>   For [a list of processors we emulate]: the time-stamp counter
>   increments at a constant rate. That rate may be set by the maximum
>   core-clock to bus-clock ratio of the processor or may be set by the
>   maximum resolved frequency at which the processor is booted. The
>   maximum resolved frequency may differ from the processor base
>   frequency, see Section 18.18.2 for more detail. On certain processors,
>   the TSC frequency may not be the same as the frequency in the brand
>   string.
> 
>   The specific processor configuration determines the behavior. Constant
>   TSC behavior ensures that the duration of each clock tick is uniform
>   and supports the use of the TSC as a wall clock timer even if the
>   processor core changes frequency. This is the architectural behavior
>   moving forward.
> 
> Invariant TSC is more useful, though, so more applications would break
> when migrating to a different TSC frequency.
> 
> > 2) Savevm: It is not safe to use the TSC for wall clock timer
> > services.
> 
> With constant TSC, we could argue that a shift to deep C-state happened
> and paused TSC, which is not a good behavior, but somewhat defensible.
> 
> > By allowing savevm, you make a commitment to allow a feature
> > at the expense of not complying with the spec (specifically the "
> > the OS may use the TSC for wall clock timer services", because the
> > TSC stops relative to realtime for the duration of the savevm stop
> > window).
> 
> Yep, we should at least guesstimate the TSC to allow the guest to resume
> with as small TSC-shift as possible and check that hosts were somewhat
> synchronized with UTC (or something we choose for time).

There are two options for savevm:

Option 1) Stop the TSC for savevm duration.

Option 2) Advance TSC to match realtime (this is known to overflow Linux
timekeeping though).


> 
> > But since Linux guests use kvmclock and Windows guests use Hyper-V
> > enlightenment, it should be fine to disable 2).
> > 
> > There is a bug open for this, btw: 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1353073
> 
> These people should be happy with just live-migrations, so can't we just
> keep savevm forbidden?

Don't see why. Perhaps savevm should be considered a "special type of
operation" that deviates from baremetal behaviour and that if
the user does savevm, then it knows TSC does not count "at a constant
rate" (so savevm breaks invariant tsc behaviour).



WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] invtsc + migration + TSC scaling
Date: Mon, 17 Oct 2016 15:20:06 -0200	[thread overview]
Message-ID: <20161017172005.GA24607@amt.cnet> (raw)
In-Reply-To: <20161017145008.GA2307@potion>

On Mon, Oct 17, 2016 at 04:50:09PM +0200, Radim Krčmář wrote:
> 2016-10-17 07:47-0200, Marcelo Tosatti:
> > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote:
> >> I have been wondering: should we allow live migration with the
> >> invtsc flag enabled, if TSC scaling is available on the
> >> destination?
> > 
> > TSC scaling and invtsc flag, yes.
> 
> Yes, if we have well synchronized time between hosts, then we might be
> able to migrate with a TSC shift that cannot be perceived by the guest.

Even if the guest can't detect the TSC difference (relative to realtime),
i suppose TSC should be advanced to account for the migration stopped 
time (so that TSC appears to have incremented at a "constant rate").

> Unless the VM also has a migratable assigned PCI device that uses ART,
> because we have no protocol to update the setting of ART (in CPUID), so
> we should keep migration forbidden then.

What is the use case for ART again? (need to catchup on that).

> 
> >> For reference, this is what the Intel SDM says about invtsc:
> >> 
> >>   The time stamp counter in newer processors may support an
> >>   enhancement, referred to as invariant TSC. Processor’s support
> >>   for invariant TSC is indicated by CPUID.80000007H:EDX[8].
> >> 
> >>   The invariant TSC will run at a constant rate in all ACPI P-,
> >>   C-. and T-states. This is the architectural behavior moving
> >>   forward. On processors with invariant TSC support, the OS may
> >>   use the TSC for wall clock timer services (instead of ACPI or
> >>   HPET timers). TSC reads are much more efficient and do not
> >>   incur the overhead associated with a ring transition or access
> >>   to a platform resource.
> >
> > Yes. The blockage happened for different reasons:
> > 
> > 1) Migration: to host with different TSC frequency.
> 
> We shouldn't have done this even now when emulating anything newer than
> Pentium 4, because those CPUs have constant TSC, which only lacks the
> guarantee that it doesn't stop in deep C-states:
> 
>   For [a list of processors we emulate]: the time-stamp counter
>   increments at a constant rate. That rate may be set by the maximum
>   core-clock to bus-clock ratio of the processor or may be set by the
>   maximum resolved frequency at which the processor is booted. The
>   maximum resolved frequency may differ from the processor base
>   frequency, see Section 18.18.2 for more detail. On certain processors,
>   the TSC frequency may not be the same as the frequency in the brand
>   string.
> 
>   The specific processor configuration determines the behavior. Constant
>   TSC behavior ensures that the duration of each clock tick is uniform
>   and supports the use of the TSC as a wall clock timer even if the
>   processor core changes frequency. This is the architectural behavior
>   moving forward.
> 
> Invariant TSC is more useful, though, so more applications would break
> when migrating to a different TSC frequency.
> 
> > 2) Savevm: It is not safe to use the TSC for wall clock timer
> > services.
> 
> With constant TSC, we could argue that a shift to deep C-state happened
> and paused TSC, which is not a good behavior, but somewhat defensible.
> 
> > By allowing savevm, you make a commitment to allow a feature
> > at the expense of not complying with the spec (specifically the "
> > the OS may use the TSC for wall clock timer services", because the
> > TSC stops relative to realtime for the duration of the savevm stop
> > window).
> 
> Yep, we should at least guesstimate the TSC to allow the guest to resume
> with as small TSC-shift as possible and check that hosts were somewhat
> synchronized with UTC (or something we choose for time).

There are two options for savevm:

Option 1) Stop the TSC for savevm duration.

Option 2) Advance TSC to match realtime (this is known to overflow Linux
timekeeping though).


> 
> > But since Linux guests use kvmclock and Windows guests use Hyper-V
> > enlightenment, it should be fine to disable 2).
> > 
> > There is a bug open for this, btw: 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1353073
> 
> These people should be happy with just live-migrations, so can't we just
> keep savevm forbidden?

Don't see why. Perhaps savevm should be considered a "special type of
operation" that deviates from baremetal behaviour and that if
the user does savevm, then it knows TSC does not count "at a constant
rate" (so savevm breaks invariant tsc behaviour).

  parent reply	other threads:[~2016-10-17 17:20 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-14 21:20 invtsc + migration + TSC scaling Eduardo Habkost
2016-10-14 21:20 ` [Qemu-devel] " Eduardo Habkost
2016-10-17  9:47 ` Marcelo Tosatti
2016-10-17  9:47   ` [Qemu-devel] " Marcelo Tosatti
2016-10-17 14:50   ` Radim Krčmář
2016-10-17 14:50     ` [Qemu-devel] " Radim Krčmář
2016-10-17 16:24     ` Paolo Bonzini
2016-10-17 16:24       ` [Qemu-devel] " Paolo Bonzini
2016-10-17 21:11       ` Eduardo Habkost
2016-10-17 21:11         ` [Qemu-devel] " Eduardo Habkost
2016-10-17 23:58         ` Marcelo Tosatti
2016-10-17 23:58           ` [Qemu-devel] " Marcelo Tosatti
2016-10-18 13:41           ` Paolo Bonzini
2016-10-18 13:41             ` [Qemu-devel] " Paolo Bonzini
2016-10-18 17:09             ` Marcelo Tosatti
2016-10-18 17:09               ` [Qemu-devel] " Marcelo Tosatti
2016-10-18 20:52               ` Radim Krčmář
2016-10-18 20:52                 ` [Qemu-devel] " Radim Krčmář
2016-10-18 21:05                 ` Eduardo Habkost
2016-10-18 21:05                   ` [Qemu-devel] " Eduardo Habkost
2016-10-19 13:27                   ` Radim Krčmář
2016-10-19 13:27                     ` [Qemu-devel] " Radim Krčmář
2016-10-19 13:55                     ` Eduardo Habkost
2016-10-19 13:55                       ` [Qemu-devel] " Eduardo Habkost
2016-10-19 15:42                       ` Radim Krčmář
2016-10-19 15:42                         ` [Qemu-devel] " Radim Krčmář
2016-10-19 17:42                         ` Eduardo Habkost
2016-10-19 17:42                           ` [Qemu-devel] " Eduardo Habkost
2016-10-18 13:48           ` Radim Krčmář
2016-10-18 13:48             ` [Qemu-devel] " Radim Krčmář
2016-10-18 13:36       ` Radim Krčmář
2016-10-18 13:36         ` [Qemu-devel] " Radim Krčmář
2016-10-18 13:38         ` Radim Krčmář
2016-10-18 13:38           ` [Qemu-devel] " Radim Krčmář
2016-10-17 17:20     ` Marcelo Tosatti [this message]
2016-10-17 17:20       ` Marcelo Tosatti
2016-10-18 13:27       ` Radim Krčmář
2016-10-18 13:27         ` [Qemu-devel] " Radim Krčmář
2016-10-18  9:04     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161017172005.GA24607@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.