From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49084) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwBaS-0008Oh-T9 for qemu-devel@nongnu.org; Mon, 17 Oct 2016 13:20:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bwBaQ-000314-HQ for qemu-devel@nongnu.org; Mon, 17 Oct 2016 13:20:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55346) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bwBaP-00030i-Vp for qemu-devel@nongnu.org; Mon, 17 Oct 2016 13:20:30 -0400 Date: Mon, 17 Oct 2016 15:20:06 -0200 From: Marcelo Tosatti Message-ID: <20161017172005.GA24607@amt.cnet> References: <20161014212031.GQ3275@thinpad.lan.raisama.net> <20161017094708.GB31691@amt.cnet> <20161017145008.GA2307@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161017145008.GA2307@potion> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] invtsc + migration + TSC scaling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: Eduardo Habkost , qemu-devel@nongnu.org, kvm@vger.kernel.org, Paolo Bonzini On Mon, Oct 17, 2016 at 04:50:09PM +0200, Radim Kr=C4=8Dm=C3=A1=C5=99 wro= te: > 2016-10-17 07:47-0200, Marcelo Tosatti: > > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote: > >> I have been wondering: should we allow live migration with the > >> invtsc flag enabled, if TSC scaling is available on the > >> destination? > >=20 > > TSC scaling and invtsc flag, yes. >=20 > Yes, if we have well synchronized time between hosts, then we might be > able to migrate with a TSC shift that cannot be perceived by the guest. Even if the guest can't detect the TSC difference (relative to realtime), i suppose TSC should be advanced to account for the migration stopped=20 time (so that TSC appears to have incremented at a "constant rate"). > Unless the VM also has a migratable assigned PCI device that uses ART, > because we have no protocol to update the setting of ART (in CPUID), so > we should keep migration forbidden then. What is the use case for ART again? (need to catchup on that). >=20 > >> For reference, this is what the Intel SDM says about invtsc: > >>=20 > >> The time stamp counter in newer processors may support an > >> enhancement, referred to as invariant TSC. Processor=E2=80=99s sup= port > >> for invariant TSC is indicated by CPUID.80000007H:EDX[8]. > >>=20 > >> The invariant TSC will run at a constant rate in all ACPI P-, > >> C-. and T-states. This is the architectural behavior moving > >> forward. On processors with invariant TSC support, the OS may > >> use the TSC for wall clock timer services (instead of ACPI or > >> HPET timers). TSC reads are much more efficient and do not > >> incur the overhead associated with a ring transition or access > >> to a platform resource. > > > > Yes. The blockage happened for different reasons: > >=20 > > 1) Migration: to host with different TSC frequency. >=20 > We shouldn't have done this even now when emulating anything newer than > Pentium 4, because those CPUs have constant TSC, which only lacks the > guarantee that it doesn't stop in deep C-states: >=20 > For [a list of processors we emulate]: the time-stamp counter > increments at a constant rate. That rate may be set by the maximum > core-clock to bus-clock ratio of the processor or may be set by the > maximum resolved frequency at which the processor is booted. The > maximum resolved frequency may differ from the processor base > frequency, see Section 18.18.2 for more detail. On certain processors= , > the TSC frequency may not be the same as the frequency in the brand > string. >=20 > The specific processor configuration determines the behavior. Constan= t > TSC behavior ensures that the duration of each clock tick is uniform > and supports the use of the TSC as a wall clock timer even if the > processor core changes frequency. This is the architectural behavior > moving forward. >=20 > Invariant TSC is more useful, though, so more applications would break > when migrating to a different TSC frequency. >=20 > > 2) Savevm: It is not safe to use the TSC for wall clock timer > > services. >=20 > With constant TSC, we could argue that a shift to deep C-state happened > and paused TSC, which is not a good behavior, but somewhat defensible. >=20 > > By allowing savevm, you make a commitment to allow a feature > > at the expense of not complying with the spec (specifically the " > > the OS may use the TSC for wall clock timer services", because the > > TSC stops relative to realtime for the duration of the savevm stop > > window). >=20 > Yep, we should at least guesstimate the TSC to allow the guest to resum= e > with as small TSC-shift as possible and check that hosts were somewhat > synchronized with UTC (or something we choose for time). There are two options for savevm: Option 1) Stop the TSC for savevm duration. Option 2) Advance TSC to match realtime (this is known to overflow Linux timekeeping though). >=20 > > But since Linux guests use kvmclock and Windows guests use Hyper-V > > enlightenment, it should be fine to disable 2). > >=20 > > There is a bug open for this, btw:=20 > > https://bugzilla.redhat.com/show_bug.cgi?id=3D1353073 >=20 > These people should be happy with just live-migrations, so can't we jus= t > keep savevm forbidden? Don't see why. Perhaps savevm should be considered a "special type of operation" that deviates from baremetal behaviour and that if the user does savevm, then it knows TSC does not count "at a constant rate" (so savevm breaks invariant tsc behaviour).