All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ming Lin <minggr@gmail.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org,
	 David Woodhouse <dwmw2@infradead.org>
Subject: Re: pvclock time drifting backward
Date: Mon, 24 Mar 2025 15:54:32 -0700	[thread overview]
Message-ID: <Z-HiqG_uk0-f6Ry1@google.com> (raw)
In-Reply-To: <facda6e2-3655-4f2c-9013-ebb18d0e6972@gmail.com>

+David

On Wed, Mar 19, 2025, Ming Lin wrote:
> Hi,
> 
> After performing a live migration on a QEMU guest OS that had been running for over 30 days,
> we noticed that the guest OS time was more than 2 seconds behind the actual time.
> 
> After extensive debugging, we found that this issue is related to master_kernel_ns and master_cycle_now.
> 
> When the guest OS starts, the host initializes a pair of master_kernel_ns and master_cycle_now values.
> After live migration, the host updates these values.
> 
> Our debugging showed that if the host does not update master_kernel_ns/master_cycle_now,
> the guest OS time remains correct.
> 
> To illustrate how updating master_kernel_ns/master_cycle_now leads to the guest OS time drifting backward,
> we applied the following debug patch:
> 
> The patch adds a KVM debugfs entry to trigger time calculations and print the results.
> The patch runs on the host side, but we use __pvclock_read_cycles() to simulate the guest OS updating its time.
> 
> Example Output:
> 
> # cat /sys/kernel/debug/kvm/946-13/pvclock
> old: master_kernel_ns: 15119778316
> old: master_cycle_now: 37225912658
> old: ns: 1893199569691
> new: master_kernel_ns: 1908210098649
> new: master_cycle_now: 4391329912268
> new: ns: 1893199548401
> 
> tsc 4391329912368
> kvmclock_offset -15010550291
> diff: ns: 21290
> 
> Explanation of Parameters:
> 
> Input:
> "old: master_kernel_ns:" The master_kernel_ns value recorded when the guest OS started (remains unchanged during testing).
> "old: master_cycle_now:" The master_cycle_now value recorded when the guest OS started (remains unchanged during testing).
> "new: master_kernel_ns:" The latest master_kernel_ns value at the time of reading.
> "new: master_cycle_now:" The latest master_cycle_now value at the time of reading.
> tsc: The rdtsc() value at the time of reading.
> kvmclock_offset: The offset recorded by KVM_SET_CLOCK when the guest OS started (remains unchanged during testing).
> 
> Output:
> "old: ns:" Time in nanoseconds calculated using the old master_kernel_ns/master_cycle_now.
> "new: ns:" Time in nanoseconds calculated using the new master_kernel_ns/master_cycle_now.
> "diff: ns:" (old ns - new ns), representing the time drift relative to the guest OS start time.
> 
> Test Script:
> #!/bin/bash
> 
> qemu_pid=$(pidof qemu-system-x86_64)
> 
> while [ 1 ] ; do
>     echo "====================================="
>     echo "Guest OS running time: $(ps -p $qemu_pid -o etime= | awk '{print $1}')"
>     cat /sys/kernel/debug/kvm/*/pvclock
>     echo
>     sleep 10
> done
> 
> Test Results:
> Below are the first and last parts of a >2-hour test run.
> As time progresses, the time drift calculated using the latest master_kernel_ns/master_cycle_now increases monotonically.
> 
> After 2 hours and 18 minutes, the guest OS time drifted by approximately 93 milliseconds.
> 
> I have uploaded an image for a more intuitive visualization of the time drift:
> https://postimg.cc/crCDWtD7
> 
> Is this a real problem?

David can confirm, but I'm pretty sure the drift you are observing is addressed
by David's series to fix a plethora of kvmclock warts.

https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org

> If there is any fix patch, I’d be happy to test it. Thanks!
> 
> 
>     1 =====================================
>     2 guest os running time: 00:50
>     3 old: master_kernel_ns: 15119778316
>     4 old: master_cycle_now: 37225912658
>     5 old: ns: 48092694964
>     6 new: master_kernel_ns: 63103244699
>     7 new: master_cycle_now: 147587790614
>     8 new: ns: 48092694425
>     9
>    10 tsc 147587790654
>    11 kvmclock_offset -15010550291
>    12 diff: ns: 539
>    13
>    14 =====================================
>    15 guest os running time: 01:00
>    16 old: master_kernel_ns: 15119778316
>    17 old: master_cycle_now: 37225912658
>    18 old: ns: 58139026532
>    19 new: master_kernel_ns: 73149576143
>    20 new: master_cycle_now: 170694333104
>    21 new: ns: 58139025879
>    22
>    23 tsc 170694333168
>    24 kvmclock_offset -15010550291
>    25 diff: ns: 653
>    26
>    27 =====================================
>    28 guest os running time: 01:10
>    29 old: master_kernel_ns: 15119778316
>    30 old: master_cycle_now: 37225912658
>    31 old: ns: 68183772122
>    32 new: master_kernel_ns: 83194321616
>    33 new: master_cycle_now: 193797227862
>    34 new: ns: 68183771357
>    35
>    36 tsc 193797227936
>    37 kvmclock_offset -15010550291
>    38 diff: ns: 765
>    39
>    40 =====================================
>    41 guest os running time: 01:20
>    42 old: master_kernel_ns: 15119778316
>    43 old: master_cycle_now: 37225912658
>    44 old: ns: 78225289157
>    45 new: master_kernel_ns: 93235838545
>    46 new: master_cycle_now: 216892696976
>    47 new: ns: 78225288279
>    48
>    49 tsc 216892697034
>    50 kvmclock_offset -15010550291
>    51 diff: ns: 878
>    52
>    53 =====================================
>    54 guest os running time: 01:30
>    55 old: master_kernel_ns: 15119778316
>    56 old: master_cycle_now: 37225912658
>    57 old: ns: 88268955340
>    58 new: master_kernel_ns: 103279504612
>    59 new: master_cycle_now: 239993109102
>    60 new: ns: 88268954349
>    61
>    62 tsc 239993109168
>    63 kvmclock_offset -15010550291
>    64 diff: ns: 991
>    65
>    66 =====================================
>    67 guest os running time: 01:40
>    68 old: master_kernel_ns: 15119778316
>    69 old: master_cycle_now: 37225912658
>    70 old: ns: 98313212581
>    71 new: master_kernel_ns: 113323761740
>    72 new: master_cycle_now: 263094880668
>    73 new: ns: 98313211476
>    74
>    75 tsc 263094880732
>    76 kvmclock_offset -15010550291
>    77 diff: ns: 1105
> .....
> .....
> 10160 =====================================
> 10161 guest os running time: 02:17:11
> 10162 old: master_kernel_ns: 15119778316
> 10163 old: master_cycle_now: 37225912658
> 10164 old: ns: 8229817213297
> 10165 new: master_kernel_ns: 8244827670997
> 10166 new: master_cycle_now: 18965537819524
> 10167 new: ns: 8229817120748
> 10168
> 10169 tsc 18965537819622
> 10170 kvmclock_offset -15010550291
> 10171 diff: ns: 92549
> 10172 =====================================
> 10173 guest os running time: 02:17:21
> 10174 old: master_kernel_ns: 15119778316
> 10175 old: master_cycle_now: 37225912658
> 10176 old: ns: 8239861074959
> 10177 new: master_kernel_ns: 8254871532564
> 10178 new: master_cycle_now: 18988638681302
> 10179 new: ns: 8239860982297
> 10180
> 10181 tsc 18988638681358
> 10182 kvmclock_offset -15010550291
> 10183 diff: ns: 92662
> 10184 =====================================
> 10185 guest os running time: 02:17:31
> 10186 old: master_kernel_ns: 15119778316
> 10187 old: master_cycle_now: 37225912658
> 10188 old: ns: 8249904622988
> 10189 new: master_kernel_ns: 8264915080459
> 10190 new: master_cycle_now: 19011738821632
> 10191 new: ns: 8249904530213
> 10192
> 10193 tsc 19011738821736
> 10194 kvmclock_offset -15010550291
> 10195 diff: ns: 92775^@
> 10196 =====================================
> 10197 guest os running time: 02:17:41
> 10198 old: master_kernel_ns: 15119778316
> 10199 old: master_cycle_now: 37225912658
> 10200 old: ns: 8259949369203
> 10201 new: master_kernel_ns: 8274959826576
> 10202 new: master_cycle_now: 19034841717872
> 10203 new: ns: 8259949276315
> 10204
> 10205 tsc 19034841717942
> 10206 kvmclock_offset -15010550291
> 10207 diff: ns: 92888
> 10208 =====================================
> 10209 guest os running time: 02:17:51
> 10210 old: master_kernel_ns: 15119778316
> 10211 old: master_cycle_now: 37225912658
> 10212 old: ns: 8269996849598
> 10213 new: master_kernel_ns: 8285007306846
> 10214 new: master_cycle_now: 19057950902658
> 10215 new: ns: 8269996756597
> 10216
> 10217 tsc 19057950902756
> 10218 kvmclock_offset -15010550291
> 10219 diff: ns: 93001^@
> 10220 =====================================
> 10221 guest os running time: 02:18:02
> 10222 old: master_kernel_ns: 15119778316
> 10223 old: master_cycle_now: 37225912658
> 10224 old: ns: 8280039094317
> 10225 new: master_kernel_ns: 8295049551453
> 10226 new: master_cycle_now: 19081048045430
> 10227 new: ns: 8280039001203
> 10228
> 10229 tsc 19081048045526
> 10230 kvmclock_offset -15010550291
> 10231 diff: ns: 93114^@
> 
> 
> 
>     pvclock debugfs patch
> ---
>  arch/x86/include/asm/kvm_host.h |  4 +++
>  arch/x86/kvm/x86.c              | 29 +++++++++++++++-
>  b.sh                            |  1 +
>  virt/kvm/kvm_main.c             | 75 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 32ae3aa50c7e..5a82a69bfe7a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1411,6 +1411,10 @@ struct kvm_arch {
>      struct delayed_work kvmclock_update_work;
>      struct delayed_work kvmclock_sync_work;
> +    u64 old_master_kernel_ns;
> +    u64 old_master_cycle_now;
> +    s64 old_kvmclock_offset;
> +
>      struct kvm_xen_hvm_config xen_hvm_config;
>      /* reads protected by irq_srcu, writes by irq_lock */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4b64ab350bcd..a56511ed8c5b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2819,7 +2819,7 @@ static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp,
>   * As with get_kvmclock_base_ns(), this counts from boot time, at the
>   * frequency of CLOCK_MONOTONIC_RAW (hence adding gtos->offs_boot).
>   */
> -static int do_kvmclock_base(s64 *t, u64 *tsc_timestamp)
> +int do_kvmclock_base(s64 *t, u64 *tsc_timestamp)
>  {
>      struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
>      unsigned long seq;
> @@ -2861,6 +2861,27 @@ static int do_monotonic(s64 *t, u64 *tsc_timestamp)
>      return mode;
>  }
> +u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc)
> +{
> +        struct pvclock_vcpu_time_info hv_clock;
> +        u64 ret;
> +
> +        hv_clock.tsc_timestamp = master_cycle_now;
> +        hv_clock.system_time = master_kernel_ns + kvmclock_offset;
> +
> +        /* both __this_cpu_read() and rdtsc() should be on the same cpu */
> +        get_cpu();
> +
> +        kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> +                                   &hv_clock.tsc_shift,
> +                                   &hv_clock.tsc_to_system_mul);
> +        ret = __pvclock_read_cycles(&hv_clock, tsc);
> +
> +        put_cpu();
> +
> +        return ret;
> +}
> +
>  static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp)
>  {
>      struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
> @@ -2988,6 +3009,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
>      host_tsc_clocksource = kvm_get_time_and_clockread(
>                      &ka->master_kernel_ns,
>                      &ka->master_cycle_now);
> +    ka->old_master_kernel_ns = ka->master_kernel_ns;
> +    ka->old_master_cycle_now = ka->master_cycle_now;
> +    printk("MYDEBUG: old_master_kernel_ns = %llu, old_master_cycle_now = %llu\n",
> +            ka->old_master_kernel_ns, ka->old_master_cycle_now);
>      ka->use_master_clock = host_tsc_clocksource && vcpus_matched
>                  && !ka->backwards_tsc_observed
> @@ -6989,6 +7014,8 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
>      else
>          now_raw_ns = get_kvmclock_base_ns();
>      ka->kvmclock_offset = data.clock - now_raw_ns;
> +    ka->old_kvmclock_offset = ka->kvmclock_offset;
> +    printk("MYDEBUG: old_kvmclock_offset = %lld\n", ka->old_kvmclock_offset);
>      kvm_end_pvclock_update(kvm);
>      return 0;
>  }
> diff --git a/b.sh b/b.sh
> new file mode 120000
> index 000000000000..0ff9a93fd53f
> --- /dev/null
> +++ b/b.sh
> @@ -0,0 +1 @@
> +/home/mlin/build.upstream/b.sh
> \ No newline at end of file
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ba0327e2d0d3..d6b9a6e7275e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -399,6 +399,7 @@ int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity,
>              return mc->nobjs >= min ? 0 : -ENOMEM;
>          mc->objects[mc->nobjs++] = obj;
>      }
> +
>      return 0;
>  }
> @@ -998,6 +999,78 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
>      }
>  }
> +extern int do_kvmclock_base(s64 *t, u64 *tsc_timestamp);
> +extern u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc);
> +
> +static ssize_t kvm_mydebug_pvclock_read(struct file *file, char __user *buf,
> +                                size_t len, loff_t *ppos)
> +{
> +    struct kvm *kvm = file->private_data;
> +    struct kvm_arch *ka;
> +    char buffer[256];
> +    ssize_t ret, copied;
> +    u64 new_master_kernel_ns;
> +    u64 new_master_cycle_now;
> +    u64 old_ns, new_ns;
> +    u64 tsc;
> +
> +    if (!kvm) {
> +        pr_err("file->private_data is NULL\n");
> +        return -EINVAL;
> +    }
> +
> +    ka = &kvm->arch;
> +
> +    do_kvmclock_base(&new_master_kernel_ns, &new_master_cycle_now);
> +
> +    tsc = rdtsc();
> +
> +    old_ns = mydebug_get_kvmclock_ns(ka->old_master_kernel_ns, ka->old_master_cycle_now, ka->old_kvmclock_offset, tsc);
> +    new_ns = mydebug_get_kvmclock_ns(new_master_kernel_ns, new_master_cycle_now, ka->old_kvmclock_offset, tsc);
> +
> +    ret = snprintf(buffer, sizeof(buffer),
> +                   "old: master_kernel_ns: %llu\n"
> +                   "old: master_cycle_now: %llu\n"
> +                   "old: ns: %llu\n"
> +                   "new: master_kernel_ns: %llu\n"
> +                   "new: master_cycle_now: %llu\n"
> +                   "new: ns: %llu\n\n"
> +                   "tsc %llu\n"
> +                   "kvmclock_offset %lld\n"
> +                   "diff: ns: %lld\n",
> +                   ka->old_master_kernel_ns, ka->old_master_cycle_now, old_ns,
> +                   new_master_kernel_ns, new_master_cycle_now, new_ns,
> +                   tsc, ka->old_kvmclock_offset,
> +                  old_ns - new_ns
> +                  );
> +
> +    if (ret < 0)
> +        return ret;
> +
> +    if ((size_t)ret > sizeof(buffer))
> +        ret = sizeof(buffer);
> +
> +    if (*ppos >= ret)
> +        return 0; /* EOF */
> +
> +    copied = min(len, (size_t)(ret - *ppos));
> +
> +    if (copy_to_user(buf, buffer + *ppos, copied)) {
> +        pr_err("copy_to_user failed\n");
> +        return -EFAULT;
> +    }
> +
> +    *ppos += copied;
> +
> +    return copied;
> +}
> +
> +static const struct file_operations kvm_pvclock_fops = {
> +    .owner = THIS_MODULE,
> +    .read = kvm_mydebug_pvclock_read,
> +    .open = simple_open,
> +};
> +
>  static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
>  {
>      static DEFINE_MUTEX(kvm_debugfs_lock);
> @@ -1063,6 +1136,8 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
>                      &stat_fops_per_vm);
>      }
> +    debugfs_create_file("pvclock", 0444, kvm->debugfs_dentry, kvm, &kvm_pvclock_fops);
> +
>      kvm_arch_create_vm_debugfs(kvm);
>      return 0;
>  out_err:
> 

  reply	other threads:[~2025-03-24 22:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  5:38 pvclock time drifting backward Ming Lin
2025-03-24 22:54 ` Sean Christopherson [this message]
2025-03-25 11:07   ` David Woodhouse
2025-03-26 15:54     ` Ming Lin
2025-03-27  8:10       ` David Woodhouse
2025-03-28 18:30         ` Ming Lin
2025-03-29  8:20           ` David Woodhouse
2026-04-20 17:26             ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-HiqG_uk0-f6Ry1@google.com \
    --to=seanjc@google.com \
    --cc=dwmw2@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minggr@gmail.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.