qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Dongli Zhang <dongli.zhang@oracle.com>, qemu-devel@nongnu.org
Cc: kvm@vger.kernel.org
Subject: Re: Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout?
Date: Tue, 23 Sep 2025 17:26:03 +0100	[thread overview]
Message-ID: <71b79d3819b5f5435b7bc7d8c451be0d276e02db.camel@infradead.org> (raw)
In-Reply-To: <7d91b34c-36fe-44ee-8a2a-fb00eaebddd8@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]

On Mon, 2025-09-22 at 12:37 -0700, Dongli Zhang wrote:
> On 9/22/25 11:16 AM, David Woodhouse wrote:
> > Assuming a modern host where the TSC just counts sanely at a consistent
> > rate and never deviates....
> > 
> > No. The PVTI should basically *never* change. Whatever the estimated
> > (not NTP skewed) frequency of the TSC is believed to be, the KVM clock
> > PVTI should indicate that at boot, telling the guest how to convert a
> > TSC value into 'monotonic nanoseconds since boot'. If it ever changes,
> > that's a KVM bug.
> > 
> > It should be saved and restored in precisely its native form, using the
> > KVM_[GS]ET_CLOCK_GUEST I referenced before. For both live update (same
> > host) and live migration (different host).
> > 
> > The TSC should also continue to count at exactly the same rate as the
> > host's TSC at all times. No breaks or discontinuities due to any kind
> > of 'steal time'. For live update that's easy as you just apply the same
> > *offset*. For live migration that's where you have to accept that it
> > depends on clock synchronization between your source and destination
> > hosts, which is probably based on realtime.
> 
> That means:
> 
> - Utilize KVM_[GS]ET_CLOCK_GUEST to avoid forward/backward drift due to the
> change in PVTI data structure (by adjusting 'ka->kvmclock_offset').

Ultimately for modern hardware I think I'd like to kill
ka->kvmclock_offset entirely but yeah, that's how it works right now I
think.

> - Utilize realtime as reference to keep clock/tsc running.

Hm, I don't like talking about 'running' vs. 'stopped'. The clock
should always be running. You try to keep it as *stable* as possible,
even across live migration. And for live migration, realtime is
probably the best you have so it's what you're stuck with.

When the guest reads their TSC, they should always get a value which is
as *close* as possible to their *original* host's TSC, minus the delta
of what that host's TSC was when they were first started (ignoring
scaling).




> > 
> > 
> > > 
> > > As demonstrated in my test, currently guest_tsc doesn't stop counting during
> > > blackout because of the lack of "MSR_IA32_TSC put" at
> > > kvmclock_vm_state_change(). Per my understanding, it is a bug and we may need to
> > > fix it.
> > > 
> > > BTW, kvmclock_vm_state_change() already utilizes KVM_SET_CLOCK to re-configure
> > > kvm-clock before continuing the guest VM.

Yeah, right now it's probably just introducing errors for a stop/start
of the VM.

> > > > 
> > > > KVM already lets you restore the TSC correctly. To restore KVM clock
> > > > correctly, you want something like KVM_SET_CLOCK_GUEST from
> > > > https://lore.kernel.org/all/20240522001817.619072-4-dwmw2@infradead.org/
> > > > 
> > > > For cross machine migration, you *do* need to use a realtime clock
> > > > reference as that's the best you have (make sure you use TAI not UTC
> > > > and don't get affected by leap seconds or smearing). Use that to
> > > > restore the *TSC* as well as you can to make it appear to have kept
> > > > running consistently. And then KVM_SET_CLOCK_GUEST just as you would on
> > > > the same host.
> > > 
> > > Indeed QEMU Live Migration also relies on kvmclock_vm_state_change() to
> > > temporarily stop/cont the source/target VM.
> > > 
> > > Would you mean we expect something different for live migration, i.e.,
> > > 
> > > 1. Live Migrate a source VM to a file.
> > > 2. Copy the file to another server.
> > > 3. Wait for 1 hour.
> > > 4. Migrate from the file to target VM.
> > > 
> > > Although it is equivalent to a one-hour downtime, we do need to count the
> > > missing one-hour, correct?
> > 
> > I don't look at it as counting anything. The clock keeps running even
> > when I'm not looking at it. If I wake up and look at it again, there is
> > no 'counting' how long I was asleep...
> > 
> 
> That means:
> 
> - stop/cont: clock/tsc stop running
> - savevm/loadvm: clock/tsc stop running

What does "stop running" even mean here? You can never stop the clock
running. The only thing you can do is change its offset so that it
jumps back to an earlier value, when you resume a VM?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

  reply	other threads:[~2025-09-23 16:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22 16:37 Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout? Dongli Zhang
2025-09-22 16:58 ` David Woodhouse
2025-09-22 17:31   ` Dongli Zhang
2025-09-22 18:16     ` David Woodhouse
2025-09-22 19:37       ` Dongli Zhang
2025-09-23 16:26         ` David Woodhouse [this message]
2025-09-23 17:25           ` Dongli Zhang
2025-09-23 17:47             ` David Woodhouse
2025-09-24 20:53               ` Dongli Zhang
2025-09-25  8:44                 ` David Woodhouse
2025-09-25 19:42                   ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71b79d3819b5f5435b7bc7d8c451be0d276e02db.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=dongli.zhang@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).