From: David Woodhouse <dwmw2@infradead.org>
To: Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Juergen Gross <jgross@suse.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
David Woodhouse <dwmw2@infradead.org>,
Paul Durrant <paul@xen.org>, Jonathan Cameron <jic23@kernel.org>,
Sascha Bischoff <Sascha.Bischoff@arm.com>,
Marc Zyngier <maz@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
Jack Allister <jalliste@amazon.com>,
Dongli Zhang <dongli.zhang@oracle.com>,
joe.jin@oracle.com, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org
Subject: [PATCH v4 25/30] KVM: x86/xen: Prevent runstate times from becoming negative
Date: Sat, 9 May 2026 23:46:51 +0100 [thread overview]
Message-ID: <20260509224824.3264567-26-dwmw2@infradead.org> (raw)
In-Reply-To: <20260509224824.3264567-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the
time spent in the previous runstate is accounted. This is based on the
delta between the current KVM clock time, and the previous value stored
in vcpu->arch.xen.runstate_entry_time.
If the KVM clock goes backwards, that delta will be negative. Or, since
it's an unsigned 64-bit integer, very *large*. Linux guests deal with
that particularly badly, reporting 100% steal time for ever more (well,
for *centuries* at least, until the delta has been consumed).
So when a negative delta is detected, just refrain from updating the
runstates until the KVM clock catches up with runstate_entry_time again.
Also clamp steal_ns to delta_ns to prevent steal time from exceeding
the total elapsed time, and handle negative steal_ns (which can happen
if run_delay goes backwards across a scheduler update).
The userspace APIs for setting the runstate times do not allow them to
be set past the current KVM clock, but userspace can still adjust the
KVM clock *after* setting the runstate times, which would cause this
situation to occur.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---
arch/x86/kvm/xen.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 82e34edbfdbd..fef52b8ea26a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -586,24 +586,33 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
{
struct kvm_vcpu_xen *vx = &v->arch.xen;
u64 now = get_kvmclock_ns(v->kvm);
- u64 delta_ns = now - vx->runstate_entry_time;
u64 run_delay = current->sched_info.run_delay;
+ s64 delta_ns = now - vx->runstate_entry_time;
+ s64 steal_ns = run_delay - vx->last_steal;
if (unlikely(!vx->runstate_entry_time))
vx->current_runstate = RUNSTATE_offline;
+ vx->last_steal = run_delay;
+
+ /*
+ * If KVM clock time went backwards, stop updating until it
+ * catches up (or the runstates are reset by userspace).
+ */
+ if (delta_ns < 0)
+ return;
+
/*
* Time waiting for the scheduler isn't "stolen" if the
* vCPU wasn't running anyway.
*/
- if (vx->current_runstate == RUNSTATE_running) {
- u64 steal_ns = run_delay - vx->last_steal;
+ if (vx->current_runstate == RUNSTATE_running && steal_ns > 0) {
+ if (steal_ns > delta_ns)
+ steal_ns = delta_ns;
delta_ns -= steal_ns;
-
vx->runstate_times[RUNSTATE_runnable] += steal_ns;
}
- vx->last_steal = run_delay;
vx->runstate_times[vx->current_runstate] += delta_ns;
vx->current_runstate = state;
--
2.51.0
next prev parent reply other threads:[~2026-05-09 22:48 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-09 22:46 [PATCH v4] 00/30] Cleaning up the KVM clock mess David Woodhouse
2026-05-09 22:46 ` [PATCH v4 01/30] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 02/30] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2026-05-09 22:46 ` [PATCH v4 03/30] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms David Woodhouse
2026-05-09 22:46 ` [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2026-05-09 22:46 ` [PATCH v4 05/30] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2026-05-09 22:46 ` [PATCH v4 06/30] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2026-05-09 22:46 ` [PATCH v4 07/30] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2026-05-09 22:46 ` [PATCH v4 08/30] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2026-05-09 22:46 ` [PATCH v4 09/30] KVM: x86: WARN if kvm_get_walltime_and_clockread() fails unexpectedly David Woodhouse
2026-05-09 22:46 ` [PATCH v4 10/30] KVM: x86: Fold __get_kvmclock() into get_kvmclock() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 11/30] KVM: x86: Add WARN and restructure get_kvmclock() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 12/30] KVM: x86: Use get_kvmclock_base_ns() as fallback in get_kvmclock() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 13/30] KVM: x86: Fix KVM clock precision in get_kvmclock() with TSC scaling David Woodhouse
2026-05-09 22:46 ` [PATCH v4 14/30] KVM: x86: Use get_kvmclock() in kvm_get_wall_clock_epoch() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 15/30] KVM: x86: Fix compute_guest_tsc() to handle negative time deltas David Woodhouse
2026-05-09 22:46 ` [PATCH v4 16/30] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling David Woodhouse
2026-05-09 22:46 ` [PATCH v4 17/30] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 18/30] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 19/30] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 20/30] KVM: x86: Kill last_tsc_{nsec,write,offset} fields David Woodhouse
2026-05-09 22:46 ` [PATCH v4 21/30] KVM: x86: Replace nr_vcpus_matched_tsc count with all_vcpus_matched_tsc bool David Woodhouse
2026-05-09 22:46 ` [PATCH v4 22/30] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2026-05-09 22:46 ` [PATCH v4 23/30] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2026-05-09 22:46 ` [PATCH v4 24/30] KVM: x86: Avoid gratuitous global clock updates David Woodhouse
2026-05-09 22:46 ` David Woodhouse [this message]
2026-05-09 22:46 ` [PATCH v4 26/30] KVM: x86: Avoid redundant masterclock updates from multiple vCPUs David Woodhouse
2026-05-09 22:46 ` [PATCH v4 27/30] KVM: x86: Add KVM_VCPU_TSC_EFFECTIVE_FREQ attribute David Woodhouse
2026-05-09 22:46 ` [PATCH v4 28/30] KVM: x86: Remove runtime Xen TSC frequency CPUID update David Woodhouse
2026-05-09 22:46 ` [PATCH v4 29/30] x86/kvm: Obtain TSC frequency from CPUID if present David Woodhouse
2026-05-09 22:46 ` [PATCH v4 30/30] x86/xen: " David Woodhouse
2026-05-10 20:56 ` [PATCH v4 33/30] KVM: selftests: Add Xen runstate migration test David Woodhouse
2026-05-10 20:58 ` [PATCH v4 31/30] KVM: selftests: Add Xen/generic CPUID timing leaf test David Woodhouse
2026-05-10 21:05 ` [PATCH v4 32/30] KVM: x86: Re-synchronize TSC after KVM_SET_TSC_KHZ David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260509224824.3264567-26-dwmw2@infradead.org \
--to=dwmw2@infradead.org \
--cc=Sascha.Bischoff@arm.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dongli.zhang@oracle.com \
--cc=hpa@zytor.com \
--cc=jalliste@amazon.com \
--cc=jgross@suse.com \
--cc=jic23@kernel.org \
--cc=joe.jin@oracle.com \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=vkuznets@redhat.com \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox