From: Paolo Bonzini <pbonzini@redhat.com>
To: John Stultz <john.stultz@linaro.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>,
Radim Krcmar <rkrcmar@redhat.com>, kvm list <kvm@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
lkml <linux-kernel@vger.kernel.org>,
x86@kernel.org, rkagan@virtuozzo.com, den@virtuozzo.com,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH v4 00/10] make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM
Date: Tue, 22 Aug 2017 17:00:53 -0400 (EDT) [thread overview]
Message-ID: <894362115.582988.1503435653874.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CALAqxLWGf6BAdji=rqKPnyvFzzUQscA5rXU8SY4RcWXymyVL1Q@mail.gmail.com>
> I still don't feel my questions have been well answered. Its really
> not clear to me why, in order to allow the level-2 guest to use a vdso
> that the answer is to export more data through the entire stack rather
> then to make the kvmclock to be usable from the vsyscall.
Thanks, this helps.
A stable kvmclock is already usable from the vsyscall. It is however not
yet usable _in the hypervisor_ as a way to provide another stable kvmclock
to the nested guest; right now the only clocksource that a hypervisor can
use to provide a stable kvmclock is the TSC.
So, regarding the "why is it necessary" part. Even on a modern host with
invariant TSC, kvmclock mediates between TSC and the guest and provides for
example support for live migration, where the TSC frequency may be
different between source and destination. If the L1 hypervisor could
use the TSC to provide a stable kvmclock, there would be no need for kvmclock
in the first place. The paravirtualized clock may well disappear in a few
years since Skylake provides TSC scaling. However, I'm not that optimistic
because people are complaining that I removed support for 2007 processors
and it seems that I'll have to put it back. So, as more people use nested
virtualization (and we have nested virt migration in the works, too), nested
kvmclock becomes more important too.
Regarding the "why is it best" part. Right now, the hypervisor makes a
copy of the timekeeper information in order to prepare the stable kvmclock.
This code is very much tied to the TSC. However, a snapshot of the timekeeper
information is almost entirely the same thing that ktime_get_snapshot returns,
so my suggestion to "untie" the hypervisor code from the TSC was to use
ktime_get_snapshot instead. This way, the clocksource itself tells KVM
whether it can be the base for a vsyscall-happy kvmclock (which means, it
must be the TSC or a linear transformation of it).
While I am very happy with how the KVM code comes out, it might certainly
be not the best solution---I definitely need help from the clocksource
maintainers here, not just approval! In particular, it doesn't help that
a lot of code surrounding ktime_get_snapshot is unused, so that may have
sent me off track.
In particular, the return value of the new callback can be defined as "is
it the TSC or a linear transformation of it". But that's as good a definition
as "is it good for KVM" (i.e., not very good) without some documentation on
the meaning of "cycles" in the struct returned by ktime_get_snapshot. Once I
understand that, I hope I can provide a better explanation for the return
value of the callback.
Paolo
> So far for a problem statement, all I've got is:
> "However, when using nested virtualization you have
>
> L0: bare-metal hypervisor (uses TSC)
> L1: nested hypervisor (uses kvmclock, can use vsyscall)
> L2: nested guest
>
> and L2 cannot use vsyscall because it is not using the TSC."
>
> Which is a start but doesn't really make it clear why the proposed
> solution is best/necessary.
>
> thanks
> -john
>
next prev parent reply other threads:[~2017-08-22 21:00 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-02 14:38 [PATCH v4 00/10] make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 01/10] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
2017-08-02 17:08 ` John Stultz
2017-08-02 17:21 ` Paolo Bonzini
2017-08-02 14:38 ` [PATCH v4 02/10] timekeeper: introduce boot field in system_time_snapshot Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 03/10] timekeeper: use the extended reading function on snapshot acquiring Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 04/10] tsc: implement the extended tsc reading function Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 05/10] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 06/10] timekeeper: add clocksource change notifier Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 07/10] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
2017-08-02 23:21 ` Marcelo Tosatti
2017-08-03 12:35 ` Paolo Bonzini
2017-08-11 22:59 ` Marcelo Tosatti
2017-08-02 14:38 ` [PATCH v4 08/10] pvclock: add parameters to store stamp data in pvclock reading function Denis Plotnikov
2017-08-02 14:38 ` [PATCH v4 09/10] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
2017-08-02 23:36 ` Marcelo Tosatti
2017-08-02 14:38 ` [PATCH v4 10/10] kvmclock: implement the extended reading function Denis Plotnikov
2017-08-02 16:10 ` [PATCH v4 00/10] make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM Paolo Bonzini
2017-08-02 16:49 ` John Stultz
2017-08-02 17:11 ` Paolo Bonzini
2017-08-21 8:40 ` Denis Plotnikov
2017-08-22 19:50 ` John Stultz
2017-08-22 21:00 ` Paolo Bonzini [this message]
2017-08-23 12:45 ` Thomas Gleixner
2017-08-23 16:02 ` Paolo Bonzini
2017-08-24 8:00 ` Paolo Bonzini
2017-08-28 7:28 ` Denis Plotnikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=894362115.582988.1503435653874.JavaMail.zimbra@redhat.com \
--to=pbonzini@redhat.com \
--cc=den@virtuozzo.com \
--cc=dplotnikov@virtuozzo.com \
--cc=hpa@zytor.com \
--cc=john.stultz@linaro.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=rkagan@virtuozzo.com \
--cc=rkrcmar@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox