Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	kvm@vger.kernel.org, "Denis V. Lunev" <den@openvz.org>,
	Owen Hofmann <osh@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] x86/kvm: fix condition to update kvm master clocks
Date: Thu, 9 Jun 2016 15:25:02 -0300	[thread overview]
Message-ID: <20160609182501.GA24024@amt.cnet> (raw)
In-Reply-To: <20160609120945.GB2570@rkaganb.sw.ru>

On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote:
> On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote:
> > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory
> > (REF_PAGE) clocks in sync. Even if the frequency correction is the same
> > for both, the kernel updates monotonic clock differently than the
> > stated frequency that is:
> > 
> >     monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq 
> > 
> > So the best solution IMO is to: 
> > 
> >     reads of guest clock = max(shared memory clock, get_kernel_ns() +
> >                                kvmclock_offset)
> > 
> > Where reads of guest clock include: 1) kvm_get_time_and_clockread
> > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK).
> > 
> > Unless someone has a better idea, Roman, can you update your patch to
> > include such solution? for kvm_get_time_and_clockread, can fast forward
> > kvmclock_offset so that 
> > 
> > kvmclock_offset + get_kernel_ns() = shared memory clock
> 
> I'm not sure I understand what you mean.
> 
> ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e.
> at the time kvm_get_time_and_clockread() runs they are in sync by
> definition.  They can diverge later due to different whole number math
> applied.

Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you
read from the guest). 


Add wrapper around get_kernel_ns + kvmclock_offset reads:

Record somewhere the last returned (last_returned_guestclock).

u64 read_guest_clock(struct kvm *kvm)
{
      mutex_lock(&guest_clock_mutex);
      ret = get_kernel_ns() + kvmclock_offset;
      kvm->arch.last_returned_guestclock = ret;
      mutex_unlock(&guest_clock_mutex);
}


Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)):

1. read guest shared memory = smval.
2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock
3. if (kclock < smval) 
        kvmclock_offset += smval - kclock


Two possibilites for clocks state:

P1. shared memory clock > get_kernel_ns() + kvmclock_offset
P2. shared memory clock < get_kernel_ns() + kvmclock_offset

Two possibilites for guest behaviour:
G1. a = shared memory clock read;
    b = get_kernel_ns() + kvmclock_offset

G1/P1:

    a = shared memory clock read (previous read, stored in memory)
    b = get_kernel_ns() + kvmclock_offset

After sync code above: Note smval > a, so b = smval > a

G1/P2:

    a = shared memory clock read (previous read, stored in memory)
    b = get_kernel_ns() + kvmclock_offset

a < b, fine.

G2 (second possibility for guest behaviour)
    a = get_kernel_ns() + kvmclock_offset
    b = shared memory clock read

G2/P1: fine, b > a.

G2/P2: 
    a = get_kernel_ns() + kvmclock_offset >
    b = shared memory clock read

So we have to either reduce get_kernel_ns() + kvmclock_offset so that 
b is larger or 'stop get_kernel_ns() + kvmclock_offset'.

Can make get_kernel_ns() + kvmclock_offset be as small as
last_returned_guestclock (otherwise users of get_kernel_ns() +
kvmclock_offset can see time backwards).

    0. mutex_lock(&guest_clock_mutex);
    01. getkernelns = get_kernel_ns();
    1. read guest shared memory = smval.
    2. kclock = getkernelns + kvmclock_offset
    3. if (kclock > smval) 
        kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
                              kclock - smval)
    4. kclock = getkernelns + kvmclock_offset
    5. if (kclock > smval) {
        schedule_timeout(kclock - smval);
        kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
                              kclock - smval)
       }
    6. mutex_unlock(&guest_clock_mutex);

That works, right?

I'll code that patch next week if you don't beat me to it.


> There's also a problem that there can be arbitrary amount of time
> between assigning the return value for guest's rdmsr and actually
> entering the guest to deliver that value.

Don't think that matters, see the 4 cases above.

> In general I'm starting to feel the shared memory clock is trying to
> provide stronger guarantees than really useful.  E.g. there's no such
> thing as synchronous TSC between vCPUs in a virtual machine, so every
> guest assuming it is broken; 

There is, the TSC is monotonic between pCPUs:

    pCPU1          |       pCPU2

1.  a = read tsc
2.                      b = read tsc.

> in reality that means that every sane guest
> must tolerate certain violations of monotonicity when multiple CPUs are
> used for timekeeping.  I wonder if this consideration can allow for some
> simplification of the paravirtual clock code...

I think applications can fail.

next prev parent reply	other threads:[~2016-06-09 18:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 14:49 [PATCH] x86/kvm: fix condition to update kvm master clocks Roman Kagan
2016-05-26 20:19 ` Radim Krčmář
2016-05-27 17:28   ` Roman Kagan
2016-05-27 18:11     ` Radim Krčmář
2016-05-27 18:46       ` Roman Kagan
2016-05-27 19:29         ` Radim Krčmář
2016-05-29 23:38 ` Marcelo Tosatti
2016-06-09  3:27   ` Marcelo Tosatti
2016-06-09  3:45     ` Marcelo Tosatti
2016-06-09 12:09     ` Roman Kagan
2016-06-09 18:25       ` Marcelo Tosatti [this message]
2016-06-09 19:19         ` Roman Kagan
2016-06-13 17:07         ` Roman Kagan
2016-06-14 22:11           ` Marcelo Tosatti
2016-06-13 17:19         ` Roman Kagan
2016-06-17 22:21           ` Marcelo Tosatti
2016-06-20 17:22             ` Roman Kagan
2016-06-20 21:29               ` Marcelo Tosatti
2016-06-21 14:40                 ` Roman Kagan
2016-06-21 21:28                   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160609182501.GA24024@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=den@openvz.org \
    --cc=kvm@vger.kernel.org \
    --cc=osh@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.