Re: [PATCH 2/3] KVM: x86: conditionally clear KVM_REQ_MASTERCLOCK_UPDATE at the end of KVM_SET_CLOCK

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: David Woodhouse <dwmw2@infradead.org>
To: Dongli Zhang <dongli.zhang@oracle.com>, kvm@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, paul@xen.org,
	tglx@kernel.org,  mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,  hpa@zytor.com,
	linux-kernel@vger.kernel.org, joe.jin@oracle.com
Subject: Re: [PATCH 2/3] KVM: x86: conditionally clear KVM_REQ_MASTERCLOCK_UPDATE at the end of KVM_SET_CLOCK
Date: Tue, 12 May 2026 08:19:39 +0100	[thread overview]
Message-ID: <a1ab80b4fb24677e1007f3be411e2954e88a36cf.camel@infradead.org> (raw)
In-Reply-To: <3363ed64-75f9-4a5c-93b9-882d72bfb7f2@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 5887 bytes --]

On Mon, 2026-05-11 at 17:21 -0700, Dongli Zhang wrote:
> 
> 
> On 5/9/26 1:04 PM, David Woodhouse wrote:
> > On Thu, 2026-01-15 at 12:22 -0800, Dongli Zhang wrote:
> > > The KVM_SET_CLOCK command calls pvclock_update_vm_gtod_copy() to update the
> > > masterclock data.
> > > 
> > > Many vCPUs may already have KVM_REQ_MASTERCLOCK_UPDATE pending before
> > > KVM_SET_CLOCK is invoked. If pvclock_update_vm_gtod_copy() decides to use
> > > the masterclock, there is no need to update the masterclock multiple times
> > > afterward. As noted in commit c52ffadc65e2 ("KVM: x86: Don't unnecessarily
> > > force masterclock update on vCPU hotplug"), each unnecessary
> > > KVM_REQ_MASTERCLOCK_UPDATE can cause the kvm-clock time to jump.
> > > 
> > > Therefore, clear KVM_REQ_MASTERCLOCK_UPDATE for each vCPU at the end of
> > > KVM_SET_CLOCK when the master clock is active. The 'tsc_write_lock' ensures
> > > that only requests issued before KVM_SET_CLOCK are cleared.
> > 
> > Hm, I think we should do this in kvm_make_mclock_inprogress_request()
> > instead.
> > 
> > Currently, things like pvclock_gtod_update_fn() and one or two others
> > set KVM_REQ_MASTERCLOCK_UPDATE for all vCPUs. Not just one, because we
> > don't want *any* vCPU going back into guest mode before doing the work.
> > 
> > So they could *all* end up calling into kvm_masterclock_update() at the
> > same time. I have a vague recollection there was once a mutex there,
> > but now there isn't.
> > 
> > So all vCPUs can each, in parallel, set KVM_REQ_MCLOCK_INPROGRESS on
> > all other vCPUs. That's an unhandled request which just makes a vCPU
> > spin (preventing it from entering the guest until the clock update
> > completes).
> > 
> > Then each vCPU tries to take the kvm->arch.tsc_write_lock in
> > kvm_start_pvclock_update(), and from this point they'll be
> > serialized... but every vCPU will go through the whole of
> > pvclock_update_vm_gtod_copy() for itself, updating the masterclock one
> > more time for each vCPU in the system?
> > 
> > I came here to say that kvm_make_mclock_inprogress_request() should
> > probably clear KVM_REQ_MASTERCLOCK_UPDATE on all other vCPUs, since
> > it's about to do the work, and the other vCPUs will no longer need to.
> > 
> > But... it's more broken than that now. 
> > 
> > I think maybe vcpu_enter_guest() should call kvm_update_masterclock()
> > if *kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE)* (i.e. not *clear* the
> > bit by using kvm_check_request()).
> > 
> > And then after getting ->arch.tsc_write_lock,
> > kvm_start_pvclock_update() should check if KVM_REQ_MASTERCLOCK_UPDATE
> > is *still* set, and only proceed with the update if it is?
> > 
> > I'll play with it some more... something like this perhaps?
> > 
> > From 91e37d74ee267c722bc2c8f26754fcdfbc040e29 Mon Sep 17 00:00:00 2001
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > Date: Sat, 9 May 2026 16:54:01 +0100
> > Subject: [PATCH] KVM: x86: Avoid redundant masterclock updates from multiple
> >  vCPUs
> > 
> > When a masterclock update is triggered (e.g. by the clocksource change
> > notifier), KVM_REQ_MASTERCLOCK_UPDATE is set on all vCPUs. Without this
> > fix, each vCPU independently processes the request and redundantly
> > re-executes the entire pvclock_update_vm_gtod_copy() sequence, serialized
> > only by tsc_write_lock. Each redundant re-snapshot of the master clock
> > reference point introduces potential clock drift.
> > 
> > Fix this by having __kvm_start_pvclock_update() check, after acquiring
> > the lock, whether the requesting vCPU's KVM_REQ_MASTERCLOCK_UPDATE is
> > still set. If another vCPU already did the update and cleared it, bail
> > out. Otherwise, clear the request on all other vCPUs before proceeding.
> > 
> > The caller in vcpu_enter_guest() now uses kvm_test_request() (non-clearing)
> > since the clearing is done inside __kvm_start_pvclock_update() under the
> > lock.
> > 
> > Suggested-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> 
> Thank you very much! Feel free to add me as Suggested-by or Reported-by.

Already did!

> I will also help test your large patch series.

Thanks.

> Regarding the issue, I would like to confirm whether there can still be any
> KVM_REQ_MASTERCLOCK_UPDATE after KVM_SET_CLOCK_GUEST.
> 
> Per my understanding, I would expect KVM_SET_CLOCK_GUEST to be the last point
> where the masterclock can be updated (during live migration or live update).
> 
> After KVM_SET_CLOCK_GUEST, any further masterclock change may cause kvm-clock drift.

I believe that for a guest with ->use_master_clock, there should be no
further masterclock change once it's running. We trigger
KVM_REQ_MASTER_CLOCK update only in exceptional cases: 

 • kvm_set_guest_paused() — vCPU 0 writes kvmclock MSR switching
   between old/new MSR (boot_vcpu_runs_old_kvmclock changes)

 • kvm_track_tsc_matching() — TSC matching state changes: either a new
   TSC generation starts, or master clock needs to be toggled on/off

 • pvclock_gtod_notify() via irq_work — host clocksource stops being
   TSC-based (e.g., TSC marked unstable, clocksource switched to HPET)

 • kvm_arch_hardware_enable() — host TSC went backwards during CPU
   online (suspend/resume, CPU hotplug) setting backwards_tsc_observed

IIRC the timekeeping code does call pvclock_gtod_notify *all* the time
even when the clock rate is stable and NTP isn't applying any skew,
because a cycle rate of e.g. 333333⅓ cycles per tick will result in it
counting 333333, 333333, then 333334 cycles in consecutive ticks. And
calling all the notifiers whenever it flips between 333333 and 333334.
But KVM only updates the master clock if it becomes non-TSC-based, so
it doesn't hurt.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

next prev parent reply	other threads:[~2026-05-12  7:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 20:22 [PATCH 0/3] KVM: x86: Mitigate kvm-clock drift caused by masterclock update Dongli Zhang
2026-01-15 20:22 ` [PATCH 1/3] KVM: x86: Fix compute_guest_tsc() to cope with negative delta Dongli Zhang
2026-01-15 20:22 ` [PATCH 2/3] KVM: x86: conditionally clear KVM_REQ_MASTERCLOCK_UPDATE at the end of KVM_SET_CLOCK Dongli Zhang
2026-05-09 20:04   ` David Woodhouse
2026-05-12  0:21     ` Dongli Zhang
2026-05-12  7:19       ` David Woodhouse [this message]
2026-01-15 20:22 ` [PATCH 3/3] KVM: x86: conditionally update masterclock data in pvclock_update_vm_gtod_copy() Dongli Zhang
2026-05-09 12:22   ` David Woodhouse
2026-05-12  0:16     ` Dongli Zhang
2026-05-12  5:21       ` David Woodhouse
2026-05-12 23:23         ` Dongli Zhang
2026-01-15 20:37 ` [PATCH 0/3] KVM: x86: Mitigate kvm-clock drift caused by masterclock update Dongli Zhang
2026-01-15 21:13   ` David Woodhouse
2026-01-16  9:31     ` Dongli Zhang
2026-01-22  5:01       ` Dongli Zhang
2026-01-24  1:31       ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1ab80b4fb24677e1007f3be411e2954e88a36cf.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hpa@zytor.com \
    --cc=joe.jin@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox