linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	Sean Christopherson <seanjc@google.com>,
	Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
	Jim Mattson <jmattson@google.com>,
	David Matlack <dmatlack@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Jing Zhang <jingzhangos@google.com>,
	Raghavendra Rao Anata <rananta@google.com>,
	James Morse <james.morse@arm.com>,
	Alexandru Elisei <Alexandru.Elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Andrew Jones <drjones@redhat.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
Date: Mon, 4 Oct 2021 11:30:11 -0300	[thread overview]
Message-ID: <20211004143011.GA72593@fuller.cnet> (raw)
In-Reply-To: <CAOQ_Qsj9ObSakmqgFQf598VscQWDh_Cq3WFqF7EpKqe2+RRgVg@mail.gmail.com>

On Fri, Oct 01, 2021 at 12:33:28PM -0700, Oliver Upton wrote:
> Marcelo,
> 
> On Fri, Oct 1, 2021 at 12:11 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote:
> > > On 01/10/21 12:32, Marcelo Tosatti wrote:
> > > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), +
> > > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...]
> > > > >  +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock
> > > > > nanoseconds +   (k_0) and realtime nanoseconds (r_0) in their
> > > > > respective fields. +   Ensure that the KVM_CLOCK_REALTIME flag is
> > > > > set in the provided +   structure. KVM will advance the VM's
> > > > > kvmclock to account for elapsed +   time since recording the clock
> > > > > values.
> > > >
> > > > You can't advance both kvmclock (kvmclock_offset variable) and the
> > > > TSCs, which would be double counting.
> > > >
> > > > So you have to either add the elapsed realtime (1) between
> > > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the
> > > > TSCs. If you do both, there is double counting. Am i missing
> > > > something?
> > >
> > > Probably one of these two (but it's worth pointing out both of them):
> > >
> > > 1) the attribute that's introduced here *replaces*
> > > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added.
> > >
> > > 2) the adjustment formula later in the algorithm does not care about how
> > > much time passed between step 1 and step 4.  It just takes two well
> > > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is
> > > the same on the destination as if the guest was still running on the
> > > source.  It is irrelevant that one of them is before migration and one
> > > is after, all it matters is that one is on the source and one is on the
> > > destination.
> >
> > OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay
> > which is introduced during migration (which is what i would guess is
> > the lower hanging fruit) (for guests using TSC).
> 
> The series gives userspace the ability to modify the guest's
> perception of the TSC in whatever way it sees fit. The algorithm in
> the documentation provides a suggestion to userspace on how to do
> exactly that. I kept that advancement logic out of the kernel because
> IMO it is an implementation detail: users have differing opinions on
> how clocks should behave across a migration and KVM shouldn't have any
> baked-in rules around it.

Ok, was just trying to visualize how this would work with QEMU Linux guests.

> 
> At the same time, userspace can choose to _not_ jump the TSC and use
> the available interfaces to just migrate the existing state of the
> TSCs.
> 
> When I had initially proposed this series upstream, Paolo astutely
> pointed out that there was no good way to get a (CLOCK_REALTIME, TSC)
> pairing, which is critical for the TSC advancement algorithm in the
> documentation. Google's best way to get (CLOCK_REALTIME, TSC) exists
> in userspace [1], hence the missing kvm clock changes. So, in all, the
> spirit of the KVM clock changes is to provide missing UAPI around the
> clock/TSC, with the side effect of changing the guest-visible value.
> 
> [1] https://cloud.google.com/spanner/docs/true-time-external-consistency
> 
> > My point was that, by advancing the _TSC value_ by:
> >
> > T0. stop guest vcpus    (source)
> > T1. KVM_GET_CLOCK       (source)
> > T2. KVM_SET_CLOCK       (destination)
> > T3. Write guest TSCs    (destination)
> > T4. resume guest        (destination)
> >
> > new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1
> >
> > t_0:    host TSC at KVM_GET_CLOCK time.
> > off_n:  TSC offset at vcpu-n (as long as no guest TSC writes are performed,
> > TSC offset is fixed).
> > ...
> >
> > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds
> > +   (k_0) and realtime nanoseconds (r_0) in their respective fields.
> > +   Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
> > +   structure. KVM will advance the VM's kvmclock to account for elapsed
> > +   time since recording the clock values.
> >
> > Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock
> > (hopefully modern guests on modern hosts will use TSC clocksource,
> > whose clock_gettime is faster... some people are using that already).
> >
> 
> Hopefully the above explanation made it clearer how the TSCs are
> supposed to get advanced, and why it isn't done in the kernel.
> 
> > At some point QEMU should enable invariant TSC flag by default?
> >
> > That said, the point is: why not advance the _TSC_ values
> > (instead of kvmclock nanoseconds), as doing so would reduce
> > the "the CLOCK_REALTIME delay which is introduced during migration"
> > for both kvmclock users and modern tsc clocksource users.
> >
> > So yes, i also like this patchset, but would like it even more
> > if it fixed the case above as well (and not sure whether adding
> > the migration delta to KVMCLOCK makes it harder to fix TSC case
> > later).
> >
> > > Perhaps we can add to step 6 something like:
> > >
> > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1)
> > > > time +   elapsed since recording state and (2) difference in TSCs
> > > > between the +   source and destination machine: + +   new_off_n = t_0
> > > > + off_n + (k_1 - k_0) * freq - t_1 +
> > >
> > > "off + t - k * freq" is the guest TSC value corresponding to a time of 0
> > > in kvmclock.  The above formula ensures that it is the same on the
> > > destination as it was on the source.
> > >
> > > Also, the names are a bit hard to follow.  Perhaps
> > >
> > >       t_0             tsc_src
> > >       t_1             tsc_dest
> > >       k_0             guest_src
> > >       k_1             guest_dest
> > >       r_0             host_src
> > >       off_n           ofs_src[i]
> > >       new_off_n       ofs_dest[i]
> > >
> > > Paolo
> > >
> 
> Yeah, sounds good to me. Shall I respin the whole series from what you
> have in kvm/queue, or just send you the bits and pieces that ought to
> be applied?
> 
> --
> Thanks,
> Oliver
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-10-04 14:33 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 18:15 [PATCH v8 0/7] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
2021-09-16 18:15 ` [PATCH v8 1/7] kvm: x86: abstract locking around pvclock_update_vm_gtod_copy Oliver Upton
2021-09-16 18:15 ` [PATCH v8 2/7] KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions Oliver Upton
2021-09-16 18:15 ` [PATCH v8 3/7] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
2021-09-29 13:33   ` Marcelo Tosatti
2021-09-16 18:15 ` [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values " Oliver Upton
2021-09-28 18:53   ` Marcelo Tosatti
2021-09-29 11:20     ` Paolo Bonzini
2021-09-29 18:56   ` Marcelo Tosatti
2021-09-30 19:21     ` Marcelo Tosatti
2021-09-30 23:02       ` Thomas Gleixner
2021-10-01 12:05         ` Marcelo Tosatti
2021-10-01 12:10           ` Marcelo Tosatti
2021-10-01 19:59           ` Thomas Gleixner
2021-10-01 21:03             ` Oliver Upton
2021-10-01 14:17         ` Paolo Bonzini
2021-10-01 14:39   ` Paolo Bonzini
2021-10-01 14:41     ` Paolo Bonzini
2021-10-01 15:39       ` Oliver Upton
2021-10-01 16:42         ` Paolo Bonzini
2024-01-17 14:28   ` David Woodhouse
2024-07-24 22:24     ` [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK' Sean Christopherson
2024-07-25  8:24       ` David Woodhouse
2021-09-16 18:15 ` [PATCH v8 5/7] kvm: x86: protect masterclock with a seqcount Oliver Upton
2021-09-24 16:42   ` Paolo Bonzini
2021-09-30 17:51   ` Marcelo Tosatti
2021-10-01 16:48   ` Paolo Bonzini
2021-09-16 18:15 ` [PATCH v8 6/7] KVM: x86: Refactor tsc synchronization code Oliver Upton
2021-09-16 18:15 ` [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
2021-09-30 19:14   ` Marcelo Tosatti
2021-10-01  9:17     ` Paolo Bonzini
2021-10-01 10:32       ` Marcelo Tosatti
2021-10-01 15:12         ` Paolo Bonzini
2021-10-01 19:11           ` Marcelo Tosatti
2021-10-01 19:33             ` Oliver Upton
2021-10-04 14:30               ` Marcelo Tosatti [this message]
2021-10-04 11:44             ` Paolo Bonzini
2021-10-05 15:22   ` Sean Christopherson
2022-02-23 10:02   ` David Woodhouse
2021-09-24 16:43 ` [PATCH v8 0/7] KVM: x86: Add idempotent controls for migrating system counter state Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211004143011.GA72593@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=Alexandru.Elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jingzhangos@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    --cc=rananta@google.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).