From: Dongli Zhang <dongli.zhang@oracle.com>
To: qemu-devel@nongnu.org
Cc: kvm@vger.kernel.org, dwmw2@infradead.org
Subject: Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout?
Date: Mon, 22 Sep 2025 09:37:48 -0700 [thread overview]
Message-ID: <2d375ec3-a071-4ae3-b03a-05a823c48016@oracle.com> (raw)
Hi,
Would you mind helping confirm if kvm-clock/guest_tsc should stop counting
elapsed time during downtime blackout?
1. guest_clock=T1, realtime=R1.
2. (qemu) stop
3. Wait for several seconds.
4. (qemu) cont
5. guest_clock=T2, realtime=R2.
Should (T1 == T2), or (R2 - R1 == T2 - T1)?
For instance, suppose guest clocksource is 'tsc'. It is still incrementing
during QEMU downtime blackout.
[root@vm ~]# while true; do date; sleep 1; done
Tue Sep 9 15:28:37 PDT 2025
Tue Sep 9 15:28:38 PDT 2025
Tue Sep 9 15:28:39 PDT 2025
Tue Sep 9 15:28:40 PDT 2025
Tue Sep 9 15:28:41 PDT 2025
Tue Sep 9 15:28:42 PDT 2025
Tue Sep 9 15:28:43 PDT 2025 ===> (qemu) stop, wait for 14 seconds.
---> 14 seconds!
Tue Sep 9 15:28:57 PDT 2025 ===> (qemu) cont
Tue Sep 9 15:28:58 PDT 2025
Tue Sep 9 15:28:59 PDT 2025
Tue Sep 9 15:29:00 PDT 2025
Tue Sep 9 15:29:01 PDT 2025
However, 'kvm-clock' stops incrementing during the blackout.
[root@vm ~]# while true; do date; sleep 1; done
Tue Sep 9 15:35:59 PDT 2025
Tue Sep 9 15:36:00 PDT 2025
Tue Sep 9 15:36:01 PDT 2025
Tue Sep 9 15:36:02 PDT 2025
Tue Sep 9 15:36:03 PDT 2025 ===> (qemu) stop, wait for many seconds.
---> No gap!
Tue Sep 9 15:36:04 PDT 2025 ===> (qemu) cont
Tue Sep 9 15:36:05 PDT 2025
Tue Sep 9 15:36:06 PDT 2025
Tue Sep 9 15:36:07 PDT 2025
Tue Sep 9 15:36:08 PDT 2025
Tue Sep 9 15:36:09 PDT 2025
Tue Sep 9 15:36:10 PDT 2025
Tue Sep 9 15:36:11 PDT 2025
Tue Sep 9 15:36:12 PDT 2025
They are many use cases that can involve a long/short downtime blackout.
- stop/cont
- savevm/loadvm
- live migration, especially from/to a file.
- dump-guest-memory
- cpr?
The KVM already exposes 'KVM_CLOCK_REALTIME' and 'KVM_VCPU_TSC_OFFSET' to help
count all elapsed time.
https://lore.kernel.org/all/20210916181538.968978-1-oupton@google.com/
This is a prototype to demonstrate how QEMU can count elapsed downtime by taking
advantage of 'KVM_CLOCK_REALTIME'.
From b97a514ac227645010ce3d1012af3a4943413844 Mon Sep 17 00:00:00 2001
From: Dongli Zhang <dongli.zhang@oracle.com>
Date: Thu, 18 Sep 2025 14:59:42 -0700
Subject: [PATCH 1/1] target/i386/kvm: take advantage of KVM_CLOCK_REALTIME
The Linux kernel commit c68dc1b577ea ("KVM: x86: Report host tsc and
realtime values in KVM_GET_CLOCK") introduced 'realtime' field and
KVM_CLOCK_REALTIME.
The 'realtime' value is saved through KVM_GET_CLOCK and restored via
KVM_SET_CLOCK. This enables the KVM clock to advance by the amount of
elapsed downtime realtime during operations such as live migration,
stop/cont, and savevm/loadvm.
This patch/feature allows QEMU to take advantage of KVM_CLOCK_REALTIME.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
hw/i386/kvm/clock.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index f56382717f..906346ce2f 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -38,6 +38,8 @@ struct KVMClockState {
/*< public >*/
uint64_t clock;
+ uint64_t realtime;
+ uint32_t flags;
bool clock_valid;
/* whether the 'clock' value was obtained in the 'paused' state */
@@ -107,7 +109,10 @@ static void kvm_update_clock(KVMClockState *s)
fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(-ret));
abort();
}
+
s->clock = data.clock;
+ s->flags = data.flags & KVM_CLOCK_REALTIME;
+ s->realtime = data.realtime;
/* If kvm_has_adjust_clock_stable() is false, KVM_GET_CLOCK returns
* essentially CLOCK_MONOTONIC plus a guest-specific adjustment. This
@@ -186,6 +191,11 @@ static void kvmclock_vm_state_change(void *opaque, bool
running,
s->clock_valid = false;
data.clock = s->clock;
+ if (s->flags & KVM_CLOCK_REALTIME) {
+ data.flags = s->flags;
+ data.realtime = s->realtime;
+ }
+
ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
if (ret < 0) {
fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
@@ -259,6 +269,7 @@ static int kvmclock_pre_load(void *opaque)
KVMClockState *s = opaque;
s->clock_is_reliable = false;
+ s->flags = 0;
return 0;
}
@@ -290,12 +301,14 @@ static int kvmclock_pre_save(void *opaque)
static const VMStateDescription kvmclock_vmsd = {
.name = "kvmclock",
- .version_id = 1,
+ .version_id = 2,
.minimum_version_id = 1,
.pre_load = kvmclock_pre_load,
.pre_save = kvmclock_pre_save,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(clock, KVMClockState),
+ VMSTATE_UINT64(realtime, KVMClockState),
+ VMSTATE_UINT32(flags, KVMClockState),
VMSTATE_END_OF_LIST()
},
.subsections = (const VMStateDescription * const []) {
--
2.39.3
To take advantage of 'KVM_VCPU_TSC_OFFSET' can further improve 'guest_tsc'.
Any suggestion on whether kvm-clock/guest_tsc should stop/continue counting
during the blackout? Any expectation or requirement by QEMU?
Thank you very much!
Dongli Zhang
next reply other threads:[~2025-09-22 16:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 16:37 Dongli Zhang [this message]
2025-09-22 16:58 ` Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout? David Woodhouse
2025-09-22 17:31 ` Dongli Zhang
2025-09-22 18:16 ` David Woodhouse
2025-09-22 19:37 ` Dongli Zhang
2025-09-23 16:26 ` David Woodhouse
2025-09-23 17:25 ` Dongli Zhang
2025-09-23 17:47 ` David Woodhouse
2025-09-24 20:53 ` Dongli Zhang
2025-09-25 8:44 ` David Woodhouse
2025-09-25 19:42 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2d375ec3-a071-4ae3-b03a-05a823c48016@oracle.com \
--to=dongli.zhang@oracle.com \
--cc=dwmw2@infradead.org \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).