From: Marcelo Tosatti <mtosatti@redhat.com>
To: Roman Kagan <rkagan@virtuozzo.com>,
kvm@vger.kernel.org, "Denis V. Lunev" <den@openvz.org>,
Owen Hofmann <osh@google.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] x86/kvm: fix condition to update kvm master clocks
Date: Thu, 9 Jun 2016 15:25:02 -0300 [thread overview]
Message-ID: <20160609182501.GA24024@amt.cnet> (raw)
In-Reply-To: <20160609120945.GB2570@rkaganb.sw.ru>
On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote:
> On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote:
> > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory
> > (REF_PAGE) clocks in sync. Even if the frequency correction is the same
> > for both, the kernel updates monotonic clock differently than the
> > stated frequency that is:
> >
> > monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq
> >
> > So the best solution IMO is to:
> >
> > reads of guest clock = max(shared memory clock, get_kernel_ns() +
> > kvmclock_offset)
> >
> > Where reads of guest clock include: 1) kvm_get_time_and_clockread
> > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK).
> >
> > Unless someone has a better idea, Roman, can you update your patch to
> > include such solution? for kvm_get_time_and_clockread, can fast forward
> > kvmclock_offset so that
> >
> > kvmclock_offset + get_kernel_ns() = shared memory clock
>
> I'm not sure I understand what you mean.
>
> ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e.
> at the time kvm_get_time_and_clockread() runs they are in sync by
> definition. They can diverge later due to different whole number math
> applied.
Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you
read from the guest).
Add wrapper around get_kernel_ns + kvmclock_offset reads:
Record somewhere the last returned (last_returned_guestclock).
u64 read_guest_clock(struct kvm *kvm)
{
mutex_lock(&guest_clock_mutex);
ret = get_kernel_ns() + kvmclock_offset;
kvm->arch.last_returned_guestclock = ret;
mutex_unlock(&guest_clock_mutex);
}
Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)):
1. read guest shared memory = smval.
2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock
3. if (kclock < smval)
kvmclock_offset += smval - kclock
Two possibilites for clocks state:
P1. shared memory clock > get_kernel_ns() + kvmclock_offset
P2. shared memory clock < get_kernel_ns() + kvmclock_offset
Two possibilites for guest behaviour:
G1. a = shared memory clock read;
b = get_kernel_ns() + kvmclock_offset
G1/P1:
a = shared memory clock read (previous read, stored in memory)
b = get_kernel_ns() + kvmclock_offset
After sync code above: Note smval > a, so b = smval > a
G1/P2:
a = shared memory clock read (previous read, stored in memory)
b = get_kernel_ns() + kvmclock_offset
a < b, fine.
G2 (second possibility for guest behaviour)
a = get_kernel_ns() + kvmclock_offset
b = shared memory clock read
G2/P1: fine, b > a.
G2/P2:
a = get_kernel_ns() + kvmclock_offset >
b = shared memory clock read
So we have to either reduce get_kernel_ns() + kvmclock_offset so that
b is larger or 'stop get_kernel_ns() + kvmclock_offset'.
Can make get_kernel_ns() + kvmclock_offset be as small as
last_returned_guestclock (otherwise users of get_kernel_ns() +
kvmclock_offset can see time backwards).
0. mutex_lock(&guest_clock_mutex);
01. getkernelns = get_kernel_ns();
1. read guest shared memory = smval.
2. kclock = getkernelns + kvmclock_offset
3. if (kclock > smval)
kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
kclock - smval)
4. kclock = getkernelns + kvmclock_offset
5. if (kclock > smval) {
schedule_timeout(kclock - smval);
kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
kclock - smval)
}
6. mutex_unlock(&guest_clock_mutex);
That works, right?
I'll code that patch next week if you don't beat me to it.
> There's also a problem that there can be arbitrary amount of time
> between assigning the return value for guest's rdmsr and actually
> entering the guest to deliver that value.
Don't think that matters, see the 4 cases above.
> In general I'm starting to feel the shared memory clock is trying to
> provide stronger guarantees than really useful. E.g. there's no such
> thing as synchronous TSC between vCPUs in a virtual machine, so every
> guest assuming it is broken;
There is, the TSC is monotonic between pCPUs:
pCPU1 | pCPU2
1. a = read tsc
2. b = read tsc.
> in reality that means that every sane guest
> must tolerate certain violations of monotonicity when multiple CPUs are
> used for timekeeping. I wonder if this consideration can allow for some
> simplification of the paravirtual clock code...
I think applications can fail.
next prev parent reply other threads:[~2016-06-09 18:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-26 14:49 [PATCH] x86/kvm: fix condition to update kvm master clocks Roman Kagan
2016-05-26 20:19 ` Radim Krčmář
2016-05-27 17:28 ` Roman Kagan
2016-05-27 18:11 ` Radim Krčmář
2016-05-27 18:46 ` Roman Kagan
2016-05-27 19:29 ` Radim Krčmář
2016-05-29 23:38 ` Marcelo Tosatti
2016-06-09 3:27 ` Marcelo Tosatti
2016-06-09 3:45 ` Marcelo Tosatti
2016-06-09 12:09 ` Roman Kagan
2016-06-09 18:25 ` Marcelo Tosatti [this message]
2016-06-09 19:19 ` Roman Kagan
2016-06-13 17:07 ` Roman Kagan
2016-06-14 22:11 ` Marcelo Tosatti
2016-06-13 17:19 ` Roman Kagan
2016-06-17 22:21 ` Marcelo Tosatti
2016-06-20 17:22 ` Roman Kagan
2016-06-20 21:29 ` Marcelo Tosatti
2016-06-21 14:40 ` Roman Kagan
2016-06-21 21:28 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160609182501.GA24024@amt.cnet \
--to=mtosatti@redhat.com \
--cc=den@openvz.org \
--cc=kvm@vger.kernel.org \
--cc=osh@google.com \
--cc=pbonzini@redhat.com \
--cc=rkagan@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox