From: Dongli Zhang <dongli.zhang@oracle.com>
To: kvm@vger.kernel.org, x86@kernel.org, linux-kselftest@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, vkuznets@redhat.com,
tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, shuah@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kprateek.nayak@amd.com, jgross@suse.com,
dwmw2@infradead.org, joe.jin@oracle.com
Subject: [PATCH 0/5] Fix and enhance KVM steal accounting for both guest and host
Date: Mon, 4 May 2026 17:30:13 -0700 [thread overview]
Message-ID: <20260505003044.78693-1-dongli.zhang@oracle.com> (raw)
This patchset resolves two issue releted to KVM steal time accounting.
1. KVM does not support vCPU hotplug. When a vCPU is removed, its
corresponding data structures are not freed by KVM. Instead, QEMU destroys
only the userspace state and the vCPU thread, while the KVM vCPU fd remains
open and parked in QEMU.
As a result, vcpu->arch.st.last_steal is not reset. If the same vCPU is
later re-created by QEMU, last_steal retains its old value, while
current->sched_info.run_delay starts from zero since a new vCPU thread is
created. This causes current->sched_info.run_delay - vcpu->arch.st.last_steal
to produce a large, bogus value.
For instance, current->sched_info.run_delay can become smaller than
vcpu->arch.st.last_steal (see line 3832) if a QEMU vCPU is re-added after
it has previously been removed.
As a result, st->steal restarts from a very small value, close to
current->sched_info.run_delay.
3748 static void record_steal_time(struct kvm_vcpu *vcpu)
3749 {
3831 unsafe_get_user(steal, &st->steal, out);
3832 steal += current->sched_info.run_delay -
3833 vcpu->arch.st.last_steal;
3834 vcpu->arch.st.last_steal = current->sched_info.run_delay;
3835 unsafe_put_user(steal, &st->steal, out);
This means that, from the guest VM, paravirt_steal_clock() for a newly added
vCPU starts from a very small value.
Since this_rq()->prev_steal_time is not reset during vCPU hotplug, it may
exceed paravirt_steal_clock(). This results in a negative delta (interpreted as
a large u64) being accounted to cpustat[CPUTIME_STEAL], causing it to appear
either very small or to start from a large u64 value (as line 275).
268 static __always_inline u64 steal_account_process_time(u64 maxtime)
269 {
270 #ifdef CONFIG_PARAVIRT
271 if (static_key_false(¶virt_steal_enabled)) {
272 u64 steal;
273
274 steal = paravirt_steal_clock(smp_processor_id());
275 steal -= this_rq()->prev_steal_time;
276 steal = min(steal, maxtime);
277 account_steal_time(steal);
278 this_rq()->prev_steal_time += steal;
279
280 return steal;
281 }
282 #endif /* CONFIG_PARAVIRT */
283 return 0;
284 }
This patchset fixes the issue on both the KVM guest and host sides by resetting
prev_steal_time/prev_steal_time_rq and vcpu->arch.st.last_steal when KVM steal
time is enabled.
2. The KVM_CLOCK_REALTIME has been introduced to help track the downtime of
live migration. KVM uses that realtime value to advance guest clock, but
the same blackout is not reflected in KVM steal time.
Account that same delta in steal time directly in kvm_vm_ioctl_set_clock(),
only when KVM_CLOCK_REALTIME is used. This keeps the KVM-only solution
self-contained and avoids adding a new KVM ioctl or requiring additional
userspace changes (i.e. QEMU).
I have also created two KVM selftests.
Dongli Zhang (5):
x86/kvm: Reset prev_steal_time and prev_steal_time_rq when enabling steal time
KVM: x86: Reset vcpu->arch.st.last_steal when enabling steal time
KVM: x86: account KVM_SET_CLOCK downtime in steal time
KVM: selftests: Test steal time when re-adding a vCPU on a new thread
KVM: selftests: Test KVM_SET_CLOCK downtime in steal time
arch/x86/include/asm/kvm_host.h | 4 +
arch/x86/kernel/kvm.c | 40 +++---
arch/x86/kvm/x86.c | 32 ++++-
include/linux/sched/cputime.h | 2 +
kernel/sched/cputime.c | 10 ++
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/x86/kvm_clock_test.c | 42 ++++--
.../selftests/kvm/x86/steal_time_reset_test.c | 144 +++++++++++++++++++
8 files changed, 248 insertions(+), 27 deletions(-)
Thank you very much!
Dongli Zhang
next reply other threads:[~2026-05-05 0:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 0:30 Dongli Zhang [this message]
2026-05-05 0:30 ` [PATCH 1/5] x86/kvm: Reset prev_steal_time and prev_steal_time_rq when enabling steal time Dongli Zhang
2026-05-05 0:30 ` [PATCH 2/5] KVM: x86: Reset vcpu->arch.st.last_steal " Dongli Zhang
2026-05-08 22:40 ` Sean Christopherson
2026-05-10 17:09 ` David Woodhouse
2026-05-10 18:40 ` David Woodhouse
2026-05-05 0:30 ` [PATCH 3/5] KVM: x86: account KVM_SET_CLOCK downtime in " Dongli Zhang
2026-05-10 18:54 ` David Woodhouse
2026-05-10 19:11 ` H. Peter Anvin
2026-05-10 20:13 ` David Woodhouse
2026-05-05 0:30 ` [PATCH 4/5] KVM: selftests: Test steal time when re-adding a vCPU on a new thread Dongli Zhang
2026-05-05 0:30 ` [PATCH 5/5] KVM: selftests: Test KVM_SET_CLOCK downtime in steal time Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505003044.78693-1-dongli.zhang@oracle.com \
--to=dongli.zhang@oracle.com \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=dave.hansen@linux.intel.com \
--cc=dietmar.eggemann@arm.com \
--cc=dwmw2@infradead.org \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=joe.jin@oracle.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=tglx@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vkuznets@redhat.com \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox