From: David Woodhouse <dwmw2@infradead.org>
To: Dongli Zhang <dongli.zhang@oracle.com>, qemu-devel@nongnu.org
Cc: kvm@vger.kernel.org
Subject: Re: Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout?
Date: Thu, 25 Sep 2025 09:44:54 +0100 [thread overview]
Message-ID: <acca55a49bad023fad30625fc81e19ef1c3d0ed8.camel@infradead.org> (raw)
In-Reply-To: <2cf13be8-cd27-4bfb-af8e-ef33286d633b@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 4114 bytes --]
On Wed, 2025-09-24 at 13:53 -0700, Dongli Zhang wrote:
>
>
> On 9/23/25 10:47 AM, David Woodhouse wrote:
> > On Tue, 2025-09-23 at 10:25 -0700, Dongli Zhang wrote:
> > >
> > >
> > > On 9/23/25 9:26 AM, David Woodhouse wrote:
> > > > On Mon, 2025-09-22 at 12:37 -0700, Dongli Zhang wrote:
> > > > > On 9/22/25 11:16 AM, David Woodhouse wrote:
> > >
> > > [snip]
> > >
> > > > > >
> > > > > > >
> > > > > > > As demonstrated in my test, currently guest_tsc doesn't stop counting during
> > > > > > > blackout because of the lack of "MSR_IA32_TSC put" at
> > > > > > > kvmclock_vm_state_change(). Per my understanding, it is a bug and we may need to
> > > > > > > fix it.
> > > > > > >
> > > > > > > BTW, kvmclock_vm_state_change() already utilizes KVM_SET_CLOCK to re-configure
> > > > > > > kvm-clock before continuing the guest VM.
> > > >
> > > > Yeah, right now it's probably just introducing errors for a stop/start
> > > > of the VM.
> > >
> > > But that help can meet the expectation?
> > >
> > > Thanks to KVM_GET_CLOCK and KVM_SET_CLOCK, QEMU saves the clock with
> > > KVM_GET_CLOCK when the VM is stopped, and restores it with KVM_SET_CLOCK when
> > > the VM is continued.
> >
> > It saves the actual *value* of the clock. I would prefer to phrase that
> > as "it makes the clock jump backwards to the time at which the guest
> > was paused".
> >
> > > This ensures that the clock value itself does not change between stop and cont.
> > >
> > > However, QEMU does not adjust the TSC offset via MSR_IA32_TSC during stop.
> > >
> > > As a result, when execution resumes, the guest TSC suddenly jumps forward.
> >
> > Oh wow, that seems really broken. If we're going to make it experience
> > a time warp, we should at least be *consistent*.
> >
> > So a guest which uses the TSC for timekeeping should be mostly
> > unaffected by this and its wallclock should still be accurate. A guest
> > which uses the KVM clock will be hosed by it.
> >
> > I think we should fix this so that the KVM clock is unaffected too.
>
> From my understanding of your reply, the kvm-clock/tsc should always be adjusted
> whenever a QEMU VM is paused and then resumed (i.e. via stop/cont).
I think I agree, except I still hate the way you use the word
'adjusted'.
If I look at my clock, and then go to sleep for a while and look at the
clock again, nobody *adjusts* it. It just keeps running.
That's the effect we should always strive for, and that's how we should
think about it and talk about it.
It's difficult to talk about clocks because what does it mean for a
clock to be "unchanged"? Does it mean that it should return the same
time value? Or that it should continue to count consistently? I would
argue that we should *always* use language which assumes the latter.
Turning to physics for a clumsy analogy, it's about the frame of
reference. We're all on a moving train. I look at you in the seat
opposite me, I go to sleep for a while, and I wake up and you're still
there. Nobody has "adjusted" your position to accommodate for the
movement of the train while I was asleep.
> This applies to:
>
> - stop / cont
> - savevm / loadvm
> - live migration
> - cpr
>
> It is a bug if the clock jumps backwards to the time at which the guest was paused.
>
> The time elapsed while the VM is paused should always be accounted for and
> reflected in kvm-clock/tsc once the VM resumes.
In particular, in *all* but the live migration case, there should be
basically nothing to do. No addition, no subtraction. Only restoring
the *existing* relationships, precisely as they were before. That is
the TSC *offset* value, and the precise TSC→kvmclock parameters, all
bitwise *exactly* the same as before.
And the only thing that changes on live migration is that you have to
set the TSC offset such that the guest sees the values it *would* have
seen on the original host at any given moment in time... and doesn't
know it was kidnapped and moved onto a different train while it was
sleeping...?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
next prev parent reply other threads:[~2025-09-25 8:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 16:37 Should QEMU (accel=kvm) kvm-clock/guest_tsc stop counting during downtime blackout? Dongli Zhang
2025-09-22 16:58 ` David Woodhouse
2025-09-22 17:31 ` Dongli Zhang
2025-09-22 18:16 ` David Woodhouse
2025-09-22 19:37 ` Dongli Zhang
2025-09-23 16:26 ` David Woodhouse
2025-09-23 17:25 ` Dongli Zhang
2025-09-23 17:47 ` David Woodhouse
2025-09-24 20:53 ` Dongli Zhang
2025-09-25 8:44 ` David Woodhouse [this message]
2025-09-25 19:42 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acca55a49bad023fad30625fc81e19ef1c3d0ed8.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).