qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout?
@ 2025-09-22 16:37 Dongli Zhang
  2025-09-22 16:58 ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: Dongli Zhang @ 2025-09-22 16:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, dwmw2

Hi,

Would you mind helping confirm if kvm-clock/guest_tsc should stop counting
elapsed time during downtime blackout?

1. guest_clock=T1, realtime=R1.
2. (qemu) stop
3. Wait for several seconds.
4. (qemu) cont
5. guest_clock=T2, realtime=R2.

Should (T1 == T2), or (R2 - R1 == T2 - T1)?


For instance, suppose guest clocksource is 'tsc'. It is still incrementing
during QEMU downtime blackout.

[root@vm ~]# while true; do date; sleep 1; done
Tue Sep  9 15:28:37 PDT 2025
Tue Sep  9 15:28:38 PDT 2025
Tue Sep  9 15:28:39 PDT 2025
Tue Sep  9 15:28:40 PDT 2025
Tue Sep  9 15:28:41 PDT 2025
Tue Sep  9 15:28:42 PDT 2025
Tue Sep  9 15:28:43 PDT 2025 ===> (qemu) stop, wait for 14 seconds.
---> 14 seconds!
Tue Sep  9 15:28:57 PDT 2025 ===> (qemu) cont
Tue Sep  9 15:28:58 PDT 2025
Tue Sep  9 15:28:59 PDT 2025
Tue Sep  9 15:29:00 PDT 2025
Tue Sep  9 15:29:01 PDT 2025


However, 'kvm-clock' stops incrementing during the blackout.

[root@vm ~]# while true; do date; sleep 1; done
Tue Sep  9 15:35:59 PDT 2025
Tue Sep  9 15:36:00 PDT 2025
Tue Sep  9 15:36:01 PDT 2025
Tue Sep  9 15:36:02 PDT 2025
Tue Sep  9 15:36:03 PDT 2025 ===> (qemu) stop, wait for many seconds.
---> No gap!
Tue Sep  9 15:36:04 PDT 2025 ===> (qemu) cont
Tue Sep  9 15:36:05 PDT 2025
Tue Sep  9 15:36:06 PDT 2025
Tue Sep  9 15:36:07 PDT 2025
Tue Sep  9 15:36:08 PDT 2025
Tue Sep  9 15:36:09 PDT 2025
Tue Sep  9 15:36:10 PDT 2025
Tue Sep  9 15:36:11 PDT 2025
Tue Sep  9 15:36:12 PDT 2025


They are many use cases that can involve a long/short downtime blackout.

- stop/cont
- savevm/loadvm
- live migration, especially from/to a file.
- dump-guest-memory
- cpr?


The KVM already exposes 'KVM_CLOCK_REALTIME' and 'KVM_VCPU_TSC_OFFSET' to help
count all elapsed time.

https://lore.kernel.org/all/20210916181538.968978-1-oupton@google.com/


This is a prototype to demonstrate how QEMU can count elapsed downtime by taking
advantage of 'KVM_CLOCK_REALTIME'.

From b97a514ac227645010ce3d1012af3a4943413844 Mon Sep 17 00:00:00 2001
From: Dongli Zhang <dongli.zhang@oracle.com>
Date: Thu, 18 Sep 2025 14:59:42 -0700
Subject: [PATCH 1/1] target/i386/kvm: take advantage of KVM_CLOCK_REALTIME

The Linux kernel commit c68dc1b577ea ("KVM: x86: Report host tsc and
realtime values in KVM_GET_CLOCK") introduced 'realtime' field and
KVM_CLOCK_REALTIME.

The 'realtime' value is saved through KVM_GET_CLOCK and restored via
KVM_SET_CLOCK. This enables the KVM clock to advance by the amount of
elapsed downtime realtime during operations such as live migration,
stop/cont, and savevm/loadvm.

This patch/feature allows QEMU to take advantage of KVM_CLOCK_REALTIME.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 hw/i386/kvm/clock.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index f56382717f..906346ce2f 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -38,6 +38,8 @@ struct KVMClockState {
     /*< public >*/

     uint64_t clock;
+    uint64_t realtime;
+    uint32_t flags;
     bool clock_valid;

     /* whether the 'clock' value was obtained in the 'paused' state */
@@ -107,7 +109,10 @@ static void kvm_update_clock(KVMClockState *s)
         fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(-ret));
                 abort();
     }
+
     s->clock = data.clock;
+    s->flags = data.flags & KVM_CLOCK_REALTIME;
+    s->realtime = data.realtime;

     /* If kvm_has_adjust_clock_stable() is false, KVM_GET_CLOCK returns
      * essentially CLOCK_MONOTONIC plus a guest-specific adjustment.  This
@@ -186,6 +191,11 @@ static void kvmclock_vm_state_change(void *opaque, bool
running,
         s->clock_valid = false;

         data.clock = s->clock;
+        if (s->flags & KVM_CLOCK_REALTIME) {
+            data.flags = s->flags;
+            data.realtime = s->realtime;
+        }
+
         ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
         if (ret < 0) {
             fprintf(stderr, "KVM_SET_CLOCK failed: %s\n", strerror(-ret));
@@ -259,6 +269,7 @@ static int kvmclock_pre_load(void *opaque)
     KVMClockState *s = opaque;

     s->clock_is_reliable = false;
+    s->flags = 0;

     return 0;
 }
@@ -290,12 +301,14 @@ static int kvmclock_pre_save(void *opaque)

 static const VMStateDescription kvmclock_vmsd = {
     .name = "kvmclock",
-    .version_id = 1,
+    .version_id = 2,
     .minimum_version_id = 1,
     .pre_load = kvmclock_pre_load,
     .pre_save = kvmclock_pre_save,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_UINT64(realtime, KVMClockState),
+        VMSTATE_UINT32(flags, KVMClockState),
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription * const []) {
--
2.39.3




To take advantage of 'KVM_VCPU_TSC_OFFSET' can further improve 'guest_tsc'.

Any suggestion on whether kvm-clock/guest_tsc should stop/continue counting
during the blackout? Any expectation or requirement by QEMU?

Thank you very much!

Dongli Zhang


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-25 19:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22 16:37 Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout? Dongli Zhang
2025-09-22 16:58 ` David Woodhouse
2025-09-22 17:31   ` Dongli Zhang
2025-09-22 18:16     ` David Woodhouse
2025-09-22 19:37       ` Dongli Zhang
2025-09-23 16:26         ` David Woodhouse
2025-09-23 17:25           ` Dongli Zhang
2025-09-23 17:47             ` David Woodhouse
2025-09-24 20:53               ` Dongli Zhang
2025-09-25  8:44                 ` David Woodhouse
2025-09-25 19:42                   ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).