Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
	kvm@vger.kernel.org, "Denis V. Lunev" <den@openvz.org>,
	Owen Hofmann <osh@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] x86/kvm: fix condition to update kvm master clocks
Date: Tue, 14 Jun 2016 19:11:27 -0300	[thread overview]
Message-ID: <20160614221127.GA10663@amt.cnet> (raw)
In-Reply-To: <20160613170747.wlkxnt7wmdjinfbu@rkaganb.sw.ru>

On Mon, Jun 13, 2016 at 08:07:47PM +0300, Roman Kagan wrote:
> On Thu, Jun 09, 2016 at 03:25:02PM -0300, Marcelo Tosatti wrote:
> > On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote:
> > > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote:
> > > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory
> > > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same
> > > > for both, the kernel updates monotonic clock differently than the
> > > > stated frequency that is:
> > > > 
> > > >     monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq 
> > > > 
> > > > So the best solution IMO is to: 
> > > > 
> > > >     reads of guest clock = max(shared memory clock, get_kernel_ns() +
> > > >                                kvmclock_offset)
> > > > 
> > > > Where reads of guest clock include: 1) kvm_get_time_and_clockread
> > > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK).
> > > > 
> > > > Unless someone has a better idea, Roman, can you update your patch to
> > > > include such solution? for kvm_get_time_and_clockread, can fast forward
> > > > kvmclock_offset so that 
> > > > 
> > > > kvmclock_offset + get_kernel_ns() = shared memory clock
> > > 
> > > I'm not sure I understand what you mean.
> > > 
> > > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e.
> > > at the time kvm_get_time_and_clockread() runs they are in sync by
> > > definition.  They can diverge later due to different whole number math
> > > applied.
> > 
> > Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you
> > read from the guest). 
> > 
> > 
> > Add wrapper around get_kernel_ns + kvmclock_offset reads:
> > 
> > Record somewhere the last returned (last_returned_guestclock).
> > 
> > u64 read_guest_clock(struct kvm *kvm)
> > {
> >       mutex_lock(&guest_clock_mutex);
> >       ret = get_kernel_ns() + kvmclock_offset;
> >       kvm->arch.last_returned_guestclock = ret;
> >       mutex_unlock(&guest_clock_mutex);
> > }
> > 
> > 
> > Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)):
> > 
> > 1. read guest shared memory = smval.
> > 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock
> > 3. if (kclock < smval) 
> >         kvmclock_offset += smval - kclock
> > 
> > 
> > Two possibilites for clocks state:
> > 
> > P1. shared memory clock > get_kernel_ns() + kvmclock_offset
> > P2. shared memory clock < get_kernel_ns() + kvmclock_offset
> > 
> > Two possibilites for guest behaviour:
> > G1. a = shared memory clock read;
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > G1/P1:
> > 
> >     a = shared memory clock read (previous read, stored in memory)
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > After sync code above: Note smval > a, so b = smval > a
> > 
> > G1/P2:
> > 
> >     a = shared memory clock read (previous read, stored in memory)
> >     b = get_kernel_ns() + kvmclock_offset
> > 
> > a < b, fine.
> > 
> > G2 (second possibility for guest behaviour)
> >     a = get_kernel_ns() + kvmclock_offset
> >     b = shared memory clock read
> > 
> > G2/P1: fine, b > a.
> > 
> > G2/P2: 
> >     a = get_kernel_ns() + kvmclock_offset >
> >     b = shared memory clock read
> > 
> > So we have to either reduce get_kernel_ns() + kvmclock_offset so that 
> > b is larger or 'stop get_kernel_ns() + kvmclock_offset'.
> > 
> > Can make get_kernel_ns() + kvmclock_offset be as small as
> > last_returned_guestclock (otherwise users of get_kernel_ns() +
> > kvmclock_offset can see time backwards).
> > 
> >     0. mutex_lock(&guest_clock_mutex);
> >     01. getkernelns = get_kernel_ns();
> >     1. read guest shared memory = smval.
> >     2. kclock = getkernelns + kvmclock_offset
> >     3. if (kclock > smval) 
> >         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
> >                               kclock - smval)
> >     4. kclock = getkernelns + kvmclock_offset
> >     5. if (kclock > smval) {
> >         schedule_timeout(kclock - smval);
> >         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
> >                               kclock - smval)
> >        }
> >     6. mutex_unlock(&guest_clock_mutex);
> > 
> > That works, right?
> 
> I wouldn't say so.
> 
> First, I don't think changing kvmclock_offset is allowed other than
> through ioctl(KVM_SET_CLOCK); 

Says who? 

> it violates everybody's expectation that
> this is the difference between the host and the guest clocks.

Who's expectation?

userspace is supposed to, when the guest starts issue KVM_SET_CLOCK.

When the guest stops, issue KVM_GET_CLOCK (save value), and issue again KVM_SET_CLOCK
(with previously saved value).

Can't see what the problem is.

> Second, since masterclock updates essentially do that synchronization, I
> think that instead of all that compilcation we can just return
> host-side-calculated value of REFERENCE_TSC_PAGE clock from
> rdmsr(REF_CLOCK) if masterclock is enabled.

Yes, that is simpler, but i also wanted to deal with the potential
backwards event that happens at masterclock update time
(that is, if you are going to update refclock page from 
get_kernel_ns() + kvmclock_offset, need to avoid backwards event
in case 

    get_kernel_ns() + kvmclock_offset < refclock.

Which is the problem in the G2/P2 case above.

But that can be done separately, so yeah sure, returning 
host-side-calculated value of REFERENCE_TSC_PAGE clock from
rdmsr(REF_CLOCK) deals with the monotonicity issue.

> > > There's also a problem that there can be arbitrary amount of time
> > > between assigning the return value for guest's rdmsr and actually
> > > entering the guest to deliver that value.
> > 
> > Don't think that matters, see the 4 cases above.
> 
> It does in the sense that between the point where we calculate the value
> we're about to return from rdmsr(REF_CLOCK), and the time the guest will
> actually see it, another vCPU can read and even update the reference tsc
> page. 

Not sure i see, can you outline the problem in the style of the 
4 cases above? (time diagrams).

>  However, as I wrote in another message in this thread, there's
> no way to guarantee clock reads to be monotonic across several vCPUs;
> OTOH that doesn't violate the monotonicity on a specific vCPU.

Yes it does and that has been a concern with Linux for a long time.
See time-warp-test.c testcase for example.

> So I'll probably just add another patch adjusting rdmsr(REF_CLOCK) to
> return shared memory clock if it's enabled.
> 
> Roman.

next prev parent reply	other threads:[~2016-06-14 22:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-26 14:49 [PATCH] x86/kvm: fix condition to update kvm master clocks Roman Kagan
2016-05-26 20:19 ` Radim Krčmář
2016-05-27 17:28   ` Roman Kagan
2016-05-27 18:11     ` Radim Krčmář
2016-05-27 18:46       ` Roman Kagan
2016-05-27 19:29         ` Radim Krčmář
2016-05-29 23:38 ` Marcelo Tosatti
2016-06-09  3:27   ` Marcelo Tosatti
2016-06-09  3:45     ` Marcelo Tosatti
2016-06-09 12:09     ` Roman Kagan
2016-06-09 18:25       ` Marcelo Tosatti
2016-06-09 19:19         ` Roman Kagan
2016-06-13 17:07         ` Roman Kagan
2016-06-14 22:11           ` Marcelo Tosatti [this message]
2016-06-13 17:19         ` Roman Kagan
2016-06-17 22:21           ` Marcelo Tosatti
2016-06-20 17:22             ` Roman Kagan
2016-06-20 21:29               ` Marcelo Tosatti
2016-06-21 14:40                 ` Roman Kagan
2016-06-21 21:28                   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160614221127.GA10663@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=den@openvz.org \
    --cc=kvm@vger.kernel.org \
    --cc=osh@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox