public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Radim Krcmar <rkrcmar@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm list <kvm@vger.kernel.org>, stable <stable@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: x86: kvm: Revert "remove sched notifier for cross-cpu migrations"
Date: Thu, 26 Mar 2015 21:31:44 +0100	[thread overview]
Message-ID: <20150326203143.GA4924@potion.brq.redhat.com> (raw)
In-Reply-To: <CALCETrWpSFcPN+-e6h+BD9djgtXSDGPNxW=ynSnmo0dEWxaZRg@mail.gmail.com>

2015-03-26 11:51-0700, Andy Lutomirski:
> On Thu, Mar 26, 2015 at 4:29 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Mar 25, 2015 at 04:22:03PM -0700, Andy Lutomirski wrote:
> >> Suppose we start out with all vcpus agreeing on their pvti and perfect
> >> invariant TSCs.  Now the host updates its frequency (due to NTP or
> >> whatever).  KVM updates vcpu 0's pvti.  Before KVM updates vcpu 1's
> >> pvti, guest code on vcpus 0 and 1 see synced TSCs but different pvti.
> >> They'll disagree on the time, and one of them will be ahead until vcpu
> >> 1's pvti gets updated.
> >
> > The masterclock scheme enforces the same system_timestamp/tsc_timestamp pairs
> > to be visible at one time, for all vcpus.
> >
> >
> >  * That is, when timespec0 != timespec1, M < N. Unfortunately that is
> >  * not
> >  * always the case (the difference between two distinct xtime instances
> >  * might be smaller then the difference between corresponding TSC reads,
> >  * when updating guest vcpus pvclock areas).
> >  *
> >  * To avoid that problem, do not allow visibility of distinct
> >  * system_timestamp/tsc_timestamp values simultaneously: use a master
> >  * copy of host monotonic time values. Update that master copy
> >  * in lockstep.
> 
> Yuck.  So we have per cpu timing data, but the protocol is only usable
> for monotonic timing because we forcibly freeze all vcpus when we
> update the nominally per cpu data.
> 
> The obvious guest implementations are still unnecessarily slow,
> though.  It would be nice if the guest could get away without using
> any getcpu operation at all.
> 
> Even if we fixed the host to increment version as advertised, I think
> we can't avoid two getcpu ops.  We need one before rdtsc to figure out
> which pvti to look at,

Yes.

>                        and we need another to make sure that we were
> actually on that cpu at the time we did rdtsc.  (Rdtscp doesn't help
> -- we need to check version before rdtsc, and we don't know what
> version to check until we do a getcpu.).

Exactly, reading cpuid after rdtsc doesn't do that though, we could have
migrated back between those reads.
rtdscp would allow us to check that we read tsc of pvti's cpu.
(It doesn't get rid of that first read.)

>                                          The migration hook has the
> same issue -- we need to check the migration count, then confirm we're
> on that cpu, then check the migration count again, and we can't do
> that until we know what cpu we're on.

True;  the revert has a bug -- we need to check cpuid for the second
time before rdtsc.  (Migration hook is there just because we don't know
which cpu executed rdtsc.)

> If, on the other hand, we could rely on having all of these things in
> sync, then this complication goes away, and we go down from two getcpu
> ops to zero.

(Yeah, we should look what are the drawbacks of doing it differently.)

  reply	other threads:[~2015-03-26 20:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-23 23:21 x86: kvm: Revert "remove sched notifier for cross-cpu migrations" Marcelo Tosatti
2015-03-23 23:30 ` Andy Lutomirski
2015-03-24 15:34 ` Radim Krčmář
2015-03-24 22:33   ` Andy Lutomirski
2015-03-25 11:08     ` Radim Krčmář
2015-03-25 12:52       ` Radim Krčmář
2015-03-25 21:28         ` Marcelo Tosatti
2015-03-25 22:33           ` Andy Lutomirski
2015-03-25 22:41             ` Marcelo Tosatti
2015-03-25 22:48               ` Andy Lutomirski
2015-03-25 23:13                 ` Marcelo Tosatti
2015-03-25 23:22                   ` Andy Lutomirski
2015-03-26 11:29                     ` Marcelo Tosatti
2015-03-26 18:51                       ` Andy Lutomirski
2015-03-26 20:31                         ` Radim Krcmar [this message]
2015-03-26 20:58                           ` Andy Lutomirski
2015-03-26 22:22                             ` Andy Lutomirski
2015-03-26 22:56                             ` Marcelo Tosatti
2015-03-26 23:09                               ` Andy Lutomirski
2015-03-26 23:22                                 ` Marcelo Tosatti
2015-03-26 23:28                                   ` Andy Lutomirski
2015-03-26 23:38                                     ` Marcelo Tosatti
2015-03-26 18:47       ` Andy Lutomirski
2015-03-26 20:10         ` Radim Krčmář
2015-03-26 20:52           ` Paolo Bonzini
2015-03-24 22:59   ` Marcelo Tosatti
2015-03-25 11:09     ` Radim Krčmář
2015-03-25 13:06 ` Radim Krčmář
2015-03-26 20:59 ` Radim Krčmář
2015-03-26 22:22   ` Marcelo Tosatti
2015-03-26 22:24     ` Andy Lutomirski
2015-03-26 22:40       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150326203143.GA4924@potion.brq.redhat.com \
    --to=rkrcmar@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox