From: Sean Christopherson <seanjc@google.com>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
Joe Jin <joe.jin@oracle.com>,
x86@kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, pbonzini@redhat.com,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically
Date: Mon, 2 Oct 2023 11:18:50 -0700 [thread overview]
Message-ID: <ZRsJiuKdXtWos_Xh@google.com> (raw)
In-Reply-To: <a8479764-34a1-334f-3865-c01325d772d9@oracle.com>
+PeterZ
Thomas and Peter,
We're trying to address an issue where KVM's paravirt kvmclock drifts from the
host's TSC-based monotonic raw clock because of historical reasons (at least, AFAICT),
even when the TSC is constant. Due to some dubious KVM behavior, KVM may sometimes
re-sync kvmclock against the host's monotonic raw clock, which causes non-trivial
jumps in time from the guest's perspective.
Linux-as-a-guest demotes all paravirt clock sources when the TSC is constant and
nonstop, and so the goofy KVM behavior isn't likely to affect the guest's clocksource,
but the guest's sched_clock() implementation keeps using the paravirt clock.
Irrespective of if/how we fix the KVM host-side mess, using a paravirt clock for
the scheduler when using a constant, nonstop TSC for the clocksource seems at best
inefficient, and at worst unnecessarily complex and risky.
Is there any reason not to prefer native_sched_clock() over whatever paravirt
clock is present when the TSC is the preferred clocksource? Assuming the desirable
thing to do is to use native_sched_clock() in this scenario, do we need a separate
rating system, or can we simply tie the sched clock selection to the clocksource
selection, e.g. override the paravirt stuff if the TSC clock has higher priority
and is chosen?
Some more details below (and in the rest of the thread).
Thanks!
On Mon, Oct 02, 2023, Dongli Zhang wrote:
> Hi Sean and David,
>
> On 10/2/23 09:37, Sean Christopherson wrote:
> > However, why does any of this matter if the host has a constant TSC? If that's
> > the case, a sane setup will expose a constant TSC to the guest and the guest will
> > use the TSC instead of kvmclock for the guest clocksource.
> >
> > Dongli, is this for long-lived "legacy" guests that were created on hosts without
> > a constant TSC? If not, then why is kvmclock being used? Or heaven forbid, are
> > you running on hardware without a constant TSC? :-)
>
> This is for test guests, and the host has all of below:
>
> tsc, rdtscp, constant_tsc, nonstop_tsc, tsc_deadline_timer, tsc_adjust
>
> A clocksource is used for two things.
>
>
> 1. current_clocksource. Yes, I agree we should always use tsc on modern hardware.
>
> Do we need to update the documentation to always suggest TSC when it is
> constant, as I believe many users still prefer pv clock than tsc?
>
> Thanks to tsc ratio scaling, the live migration will not impact tsc.
>
> >From the source code, the rating of kvm-clock is still higher than tsc.
>
> BTW., how about to decrease the rating if guest detects constant tsc?
>
> 166 struct clocksource kvm_clock = {
> 167 .name = "kvm-clock",
> 168 .read = kvm_clock_get_cycles,
> 169 .rating = 400,
> 170 .mask = CLOCKSOURCE_MASK(64),
> 171 .flags = CLOCK_SOURCE_IS_CONTINUOUS,
> 172 .enable = kvm_cs_enable,
> 173 };
>
> 1196 static struct clocksource clocksource_tsc = {
> 1197 .name = "tsc",
> 1198 .rating = 300,
> 1199 .read = read_tsc,
That's already done in kvmclock_init().
if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
!check_tsc_unstable())
kvm_clock.rating = 299;
See also: https://lore.kernel.org/all/ZOjF2DMBgW%2FzVvL3@google.com
> 2. The sched_clock.
>
> The scheduling is impacted if there is big drift.
...
> Unfortunately, the "no-kvmclock" kernel parameter disables all pv clock
> operations (not only sched_clock), e.g., after line 300.
...
> Should I introduce a new param to disable no-kvm-sched-clock only, or to
> introduce a new param to allow the selection of sched_clock?
I don't think we want a KVM-specific knob, because every flavor of paravirt guest
would need to do the same thing. And unless there's a good reason to use a
paravirt clock, this really shouldn't be something the guest admin needs to opt
into using.
next prev parent reply other threads:[~2023-10-02 18:18 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 23:06 [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically Dongli Zhang
2023-09-27 0:29 ` Joe Jin
2023-09-27 0:36 ` Dongli Zhang
2023-09-28 16:18 ` Sean Christopherson
2023-09-29 20:15 ` Dongli Zhang
2023-10-02 8:33 ` David Woodhouse
2023-10-02 16:37 ` Sean Christopherson
2023-10-02 17:17 ` Dongli Zhang
2023-10-02 18:18 ` Sean Christopherson [this message]
2023-10-02 21:06 ` Peter Zijlstra
2023-10-02 21:16 ` Peter Zijlstra
2023-10-02 18:16 ` David Woodhouse
2023-10-03 0:53 ` Sean Christopherson
2023-10-03 1:32 ` Dongli Zhang
2023-10-03 1:49 ` Sean Christopherson
2023-10-03 2:07 ` Dongli Zhang
2023-10-03 21:00 ` Sean Christopherson
2023-10-03 5:54 ` David Woodhouse
2023-10-04 0:04 ` Sean Christopherson
2023-10-04 10:01 ` David Woodhouse
2023-10-04 18:06 ` Sean Christopherson
2023-10-04 19:13 ` Dongli Zhang
2023-10-11 0:20 ` Sean Christopherson
2023-10-11 7:18 ` David Woodhouse
2023-10-13 18:07 ` Sean Christopherson
2023-10-13 18:21 ` David Woodhouse
2023-10-13 19:02 ` Sean Christopherson
2023-10-13 19:12 ` David Woodhouse
2023-10-13 20:03 ` Sean Christopherson
2023-10-13 20:12 ` Dongli Zhang
2023-10-13 23:26 ` Sean Christopherson
2023-10-14 9:49 ` David Woodhouse
2023-10-16 15:47 ` Dongli Zhang
2023-10-16 16:25 ` David Woodhouse
2023-10-16 17:04 ` Dongli Zhang
2023-10-16 18:49 ` Sean Christopherson
2023-10-16 22:04 ` Dongli Zhang
2023-10-16 22:48 ` Sean Christopherson
2023-10-17 16:18 ` Dongli Zhang
2023-10-03 9:12 ` David Woodhouse
2023-10-04 0:07 ` Sean Christopherson
2023-10-04 8:06 ` David Woodhouse
2023-10-03 14:29 ` David Woodhouse
2023-10-04 0:10 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRsJiuKdXtWos_Xh@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dongli.zhang@oracle.com \
--cc=dwmw2@infradead.org \
--cc=joe.jin@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox