Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	kvm@vger.kernel.org, "Denis V. Lunev" <den@openvz.org>,
	Owen Hofmann <osh@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] x86/kvm: fix condition to update kvm master clocks
Date: Tue, 14 Jun 2016 19:11:27 -0300	[thread overview]
Message-ID: <20160614221127.GA10663@amt.cnet> (raw)
In-Reply-To: <20160613170747.wlkxnt7wmdjinfbu@rkaganb.sw.ru>

On Mon, Jun 13, 2016 at 08:07:47PM +0300, Roman Kagan wrote:
> On Thu, Jun 09, 2016 at 03:25:02PM -0300, Marcelo Tosatti wrote:
> > On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote:
> > > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote:
> > > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory
> > > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same
> > > > for both, the kernel updates monotonic clock differently than the
> > > > stated frequency that is:
> > > > 
> > > >     monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq 
> > > > 
> > > > So the best solution IMO is to: 
> > > > 
> > > >     reads of guest clock = max(shared memory clock, get_kernel_ns() +
> > > >                                kvmclock_offset)
> > > > 
> > > > Where reads of guest clock include: 1) kvm_get_time_and_clockread
> > > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK).
> > > > 
> > > > Unless someone has a better idea, Roman, can you update your patch to
> > > > include such solution? for kvm_get_time_and_clockread, can fast forward
> > > > kvmclock_offset so that 
> > > > 
> > > > kvmclock_offset + get_kernel_ns() = shared memory clock
> > > 
> > > I'm not sure I understand what you mean.
> > > 
> > > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e.
> > > at the time kvm_get_time_and_clockread() runs they are in sync by
> > > definition.  They can diverge later due to different whole number math
> > > applied.
> > 
> > Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you
> > read from the guest). 
> > 
> > 
> > Add wrapper around get_kernel_ns + kvmclock_offset reads:
> > 
> > Record somewhere the last returned (last_returned_guestclock).
> > 
> > u64 read_guest_clock(struct kvm *kvm)
> > {
> >       mutex_lock(&guest_clock_mutex);
> >       ret = get_kernel_ns() + kvmclock_offset;
> >       kvm->arch.last_returned_guestclock = ret;
> >       mutex_unlock(&guest_clock_mutex);
> > }
> > 
> > 
> > Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)):
> > 
> > 1. read guest shared memory = smval.
> > 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock
> > 3. if (kclock < smval) 
> >         kvmclock_offset += smval - kclock
> > 
> > 
> > Two possibilites for clocks state:
> > 
> > P1. shared memory clock > get_kernel_ns() + kvmclock_offset
> > P2. shared memory clock < get_kernel_ns() + kvmclock_offset
> > 
> > Two possibilites for guest behaviour:
> > G1. a = shared memory clock read;
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > G1/P1:
> > 
> >     a = shared memory clock read (previous read, stored in memory)
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > After sync code above: Note smval > a, so b = smval > a
> > 
> > G1/P2:
> > 
> >     a = shared memory clock read (previous read, stored in memory)
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > a < b, fine.
> > 
> > G2 (second possibility for guest behaviour)
> >     a = get_kernel_ns() + kvmclock_offset
> >     b = shared memory clock read
> > 
> > G2/P1: fine, b > a.
> > 
> > G2/P2: 
> >     a = get_kernel_ns() + kvmclock_offset >
> >     b = shared memory clock read
> > 
> > So we have to either reduce get_kernel_ns() + kvmclock_offset so that 
> > b is larger or 'stop get_kernel_ns() + kvmclock_offset'.
> > 
> > Can make get_kernel_ns() + kvmclock_offset be as small as
> > last_returned_guestclock (otherwise users of get_kernel_ns() +
> > kvmclock_offset can see time backwards).
> > 
> >     0. mutex_lock(&guest_clock_mutex);
> >     01. getkernelns = get_kernel_ns();
> >     1. read guest shared memory = smval.
> >     2. kclock = getkernelns + kvmclock_offset
> >     3. if (kclock > smval) 
> >         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
> >                               kclock - smval)
> >     4. kclock = getkernelns + kvmclock_offset
> >     5. if (kclock > smval) {
> >         schedule_timeout(kclock - smval);
> >         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
> >                               kclock - smval)
> >        }
> >     6. mutex_unlock(&guest_clock_mutex);
> > 
> > That works, right?
> 
> I wouldn't say so.
> 
> First, I don't think changing kvmclock_offset is allowed other than
> through ioctl(KVM_SET_CLOCK); 

Says who? 

> it violates everybody's expectation that
> this is the difference between the host and the guest clocks.

Who's expectation?

userspace is supposed to, when the guest starts issue KVM_SET_CLOCK.

When the guest stops, issue KVM_GET_CLOCK (save value), and issue again KVM_SET_CLOCK
(with previously saved value).

Can't see what the problem is.

> Second, since masterclock updates essentially do that synchronization, I
> think that instead of all that compilcation we can just return
> host-side-calculated value of REFERENCE_TSC_PAGE clock from
> rdmsr(REF_CLOCK) if masterclock is enabled.

Yes, that is simpler, but i also wanted to deal with the potential
backwards event that happens at masterclock update time
(that is, if you are going to update refclock page from 
get_kernel_ns() + kvmclock_offset, need to avoid backwards event
in case 

    get_kernel_ns() + kvmclock_offset < refclock.

Which is the problem in the G2/P2 case above.

But that can be done separately, so yeah sure, returning 
host-side-calculated value of REFERENCE_TSC_PAGE clock from
rdmsr(REF_CLOCK) deals with the monotonicity issue.

> > > There's also a problem that there can be arbitrary amount of time
> > > between assigning the return value for guest's rdmsr and actually
> > > entering the guest to deliver that value.
> > 
> > Don't think that matters, see the 4 cases above.
> 
> It does in the sense that between the point where we calculate the value
> we're about to return from rdmsr(REF_CLOCK), and the time the guest will
> actually see it, another vCPU can read and even update the reference tsc
> page. 

Not sure i see, can you outline the problem in the style of the 
4 cases above? (time diagrams).

>  However, as I wrote in another message in this thread, there's
> no way to guarantee clock reads to be monotonic across several vCPUs;
> OTOH that doesn't violate the monotonicity on a specific vCPU.

Yes it does and that has been a concern with Linux for a long time.
See time-warp-test.c testcase for example.

> So I'll probably just add another patch adjusting rdmsr(REF_CLOCK) to
> return shared memory clock if it's enabled.
> 
> Roman.

next prev parent reply	other threads:[~2016-06-14 22:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 14:49 [PATCH] x86/kvm: fix condition to update kvm master clocks Roman Kagan
2016-05-26 20:19 ` Radim Krčmář
2016-05-27 17:28   ` Roman Kagan
2016-05-27 18:11     ` Radim Krčmář
2016-05-27 18:46       ` Roman Kagan
2016-05-27 19:29         ` Radim Krčmář
2016-05-29 23:38 ` Marcelo Tosatti
2016-06-09  3:27   ` Marcelo Tosatti
2016-06-09  3:45     ` Marcelo Tosatti
2016-06-09 12:09     ` Roman Kagan
2016-06-09 18:25       ` Marcelo Tosatti
2016-06-09 19:19         ` Roman Kagan
2016-06-13 17:07         ` Roman Kagan
2016-06-14 22:11           ` Marcelo Tosatti [this message]
2016-06-13 17:19         ` Roman Kagan
2016-06-17 22:21           ` Marcelo Tosatti
2016-06-20 17:22             ` Roman Kagan
2016-06-20 21:29               ` Marcelo Tosatti
2016-06-21 14:40                 ` Roman Kagan
2016-06-21 21:28                   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160614221127.GA10663@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=den@openvz.org \
    --cc=kvm@vger.kernel.org \
    --cc=osh@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.