From: David Woodhouse <dwmw2@infradead.org>
To: Dongli Zhang <dongli.zhang@oracle.com>, qemu-devel@nongnu.org
Cc: kvm@vger.kernel.org
Subject: Re: Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout?
Date: Tue, 23 Sep 2025 18:47:35 +0100 [thread overview]
Message-ID: <2e958c58d1d1f0107475b3d91f7a6f2a28da13de.camel@infradead.org> (raw)
In-Reply-To: <bbadb98b-964c-4eaa-8826-441a28e08100@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 4219 bytes --]
On Tue, 2025-09-23 at 10:25 -0700, Dongli Zhang wrote:
>
>
> On 9/23/25 9:26 AM, David Woodhouse wrote:
> > On Mon, 2025-09-22 at 12:37 -0700, Dongli Zhang wrote:
> > > On 9/22/25 11:16 AM, David Woodhouse wrote:
>
> [snip]
>
> > > >
> > > > >
> > > > > As demonstrated in my test, currently guest_tsc doesn't stop counting during
> > > > > blackout because of the lack of "MSR_IA32_TSC put" at
> > > > > kvmclock_vm_state_change(). Per my understanding, it is a bug and we may need to
> > > > > fix it.
> > > > >
> > > > > BTW, kvmclock_vm_state_change() already utilizes KVM_SET_CLOCK to re-configure
> > > > > kvm-clock before continuing the guest VM.
> >
> > Yeah, right now it's probably just introducing errors for a stop/start
> > of the VM.
>
> But that help can meet the expectation?
>
> Thanks to KVM_GET_CLOCK and KVM_SET_CLOCK, QEMU saves the clock with
> KVM_GET_CLOCK when the VM is stopped, and restores it with KVM_SET_CLOCK when
> the VM is continued.
It saves the actual *value* of the clock. I would prefer to phrase that
as "it makes the clock jump backwards to the time at which the guest
was paused".
> This ensures that the clock value itself does not change between stop and cont.
>
> However, QEMU does not adjust the TSC offset via MSR_IA32_TSC during stop.
>
> As a result, when execution resumes, the guest TSC suddenly jumps forward.
Oh wow, that seems really broken. If we're going to make it experience
a time warp, we should at least be *consistent*.
So a guest which uses the TSC for timekeeping should be mostly
unaffected by this and its wallclock should still be accurate. A guest
which uses the KVM clock will be hosed by it.
I think we should fix this so that the KVM clock is unaffected too.
> >
> > > > > >
> > > > > > KVM already lets you restore the TSC correctly. To restore KVM clock
> > > > > > correctly, you want something like KVM_SET_CLOCK_GUEST from
> > > > > > https://lore.kernel.org/all/20240522001817.619072-4-dwmw2@infradead.org/
> > > > > >
> > > > > > For cross machine migration, you *do* need to use a realtime clock
> > > > > > reference as that's the best you have (make sure you use TAI not UTC
> > > > > > and don't get affected by leap seconds or smearing). Use that to
> > > > > > restore the *TSC* as well as you can to make it appear to have kept
> > > > > > running consistently. And then KVM_SET_CLOCK_GUEST just as you would on
> > > > > > the same host.
> > > > >
> > > > > Indeed QEMU Live Migration also relies on kvmclock_vm_state_change() to
> > > > > temporarily stop/cont the source/target VM.
> > > > >
> > > > > Would you mean we expect something different for live migration, i.e.,
> > > > >
> > > > > 1. Live Migrate a source VM to a file.
> > > > > 2. Copy the file to another server.
> > > > > 3. Wait for 1 hour.
> > > > > 4. Migrate from the file to target VM.
> > > > >
> > > > > Although it is equivalent to a one-hour downtime, we do need to count the
> > > > > missing one-hour, correct?
> > > >
> > > > I don't look at it as counting anything. The clock keeps running even
> > > > when I'm not looking at it. If I wake up and look at it again, there is
> > > > no 'counting' how long I was asleep...
> > > >
> > >
> > > That means:
> > >
> > > - stop/cont: clock/tsc stop running
> > > - savevm/loadvm: clock/tsc stop running
> >
> > What does "stop running" even mean here? You can never stop the clock
> > running. The only thing you can do is change its offset so that it
> > jumps back to an earlier value, when you resume a VM?
> >
>
> Yes, I meant "change its offset so that it jumps back to an earlier value." From
> the VM's perspective, this is equivalent to "the clock was stopped."
Yeah... let's stop doing that :)
> Could you help explain why we treat stop/cont differently from live migration?
We shouldn't. Hardware companies spent *years* learning the lesson
about clocks. That they should just keep counting. At the same rate.
Unconditionally. Even if the CPU is running slower. Or stopped. Or
whatever. Just count. Do not stop counting.
Let's not repeat the same mistakes.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
next prev parent reply other threads:[~2025-09-23 17:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 16:37 Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout? Dongli Zhang
2025-09-22 16:58 ` David Woodhouse
2025-09-22 17:31 ` Dongli Zhang
2025-09-22 18:16 ` David Woodhouse
2025-09-22 19:37 ` Dongli Zhang
2025-09-23 16:26 ` David Woodhouse
2025-09-23 17:25 ` Dongli Zhang
2025-09-23 17:47 ` David Woodhouse [this message]
2025-09-24 20:53 ` Dongli Zhang
2025-09-25 8:44 ` David Woodhouse
2025-09-25 19:42 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e958c58d1d1f0107475b3d91f7a6f2a28da13de.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).