From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94148372B3B; Sat, 9 May 2026 22:48:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778366929; cv=none; b=EuAEDuUIYr2cvpF8LeLv2o+LSFQzmP1xLtenXWt5OjnyCtOFVZwjS/LDOQrI0fE8LTe8JxHJW54j6uh3FvbLRtaL1AipLRp2c0d+pKMtc8WbpcMI06INUPSADPT85HFu9OpGbvw64v0VDskMD3ms/hb4tNy8mQFW6txVTisjx+E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778366929; c=relaxed/simple; bh=6dnFN/4lAFm15A7TGXHna3ObAaSAz5Kpq5sKkFQfu6A=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i2cWO1UHthceOkMCAXmoXN+70Kdpm0KQd4aO0g4VJWKRax89CASttSQulhThe1i8/ao12qTrg+qm7Fbt6IAe6mbrKCZoQ9OGlhFd7BzswUccWksSKny/eZJ3ukVuoGV3vU9ZhKa8RbisspuYLoejEMnSIEo3Lq9GGNlZqFc6E88= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=JwWsCSTN; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="JwWsCSTN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=CvARcxGjW2rk76ZnbjloHmvArQA7vOXWK2cN1ZjzCjA=; b=JwWsCSTN0jP5CaRFdIC/kq7DJK 1ckbZPZwhNr+bWo+2Nhf9ylITeR8BeXmgLFP+SLIQy4R3DZLic8UmfJCBL5L/HKQUEiJRG0E/oNuN +lwLHmVzILBn/0iCKWTWSrft7nOvyYmtLxwuFUVmSHB8ERSs5oQFnzVtmRGkTjYFBlwInjCp4Xkjk DZsdQFZY+Wi0hj8gEaa+ri3S19xehlprECPd42H2HjvqS913SmW4HTcoBeS5SSq8WxjSdNReZYJMO P7XDr1BLUNqOsGb6V0VGGUsZ6d3Y6v65PQFbi1ulY6TtP16QxEvcTW/cb1Z5dK786u3iT28gS5bBR /Fo08+AA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLqTD-00000008wzM-0Uid; Sat, 09 May 2026 22:48:28 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLqTD-0000000DhIv-0xnG; Sat, 09 May 2026 23:48:27 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v4 26/30] KVM: x86: Avoid redundant masterclock updates from multiple vCPUs Date: Sat, 9 May 2026 23:46:52 +0100 Message-ID: <20260509224824.3264567-27-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260509224824.3264567-1-dwmw2@infradead.org> References: <20260509224824.3264567-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse When a masterclock update is triggered (e.g. by the clocksource change notifier), KVM_REQ_MASTERCLOCK_UPDATE is set on all vCPUs. Without this fix, each vCPU independently processes the request and redundantly re-executes the entire pvclock_update_vm_gtod_copy() sequence, serialized only by tsc_write_lock. Each redundant re-snapshot of the master clock reference point introduces potential clock drift. Fix this by having __kvm_start_pvclock_update() check, after acquiring the lock, whether the requesting vCPU's KVM_REQ_MASTERCLOCK_UPDATE is still set. If another vCPU already did the update and cleared it, bail out. Otherwise, clear the request on all other vCPUs before proceeding. The caller in vcpu_enter_guest() now uses kvm_test_request() (non-clearing) since the clearing is done inside __kvm_start_pvclock_update() under the lock. Suggested-by: Dongli Zhang Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 56 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7d9ec0638d28..77dfd4455a4e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3259,10 +3259,39 @@ static void kvm_make_mclock_inprogress_request(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } -static void __kvm_start_pvclock_update(struct kvm *kvm) +static void kvm_clear_mclock_inprogress_request(struct kvm *kvm) { + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); +} + +static bool __kvm_start_pvclock_update(struct kvm *kvm, struct kvm_vcpu *requesting_vcpu) +{ + struct kvm_vcpu *vcpu; + unsigned long i; + raw_spin_lock_irq(&kvm->arch.tsc_write_lock); + + /* + * If another vCPU already did the update while we were waiting + * for the lock, our request will have been cleared. Bail out. + */ + if (requesting_vcpu && + !kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE, requesting_vcpu)) { + kvm_clear_mclock_inprogress_request(kvm); + raw_spin_unlock_irq(&kvm->arch.tsc_write_lock); + return false; + } + + /* The update is VM-wide; prevent other vCPUs from redoing it. */ + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_clear_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); + write_seqcount_begin(&kvm->arch.pvclock_sc); + return true; } static void kvm_start_pvclock_update(struct kvm *kvm) @@ -3270,7 +3299,7 @@ static void kvm_start_pvclock_update(struct kvm *kvm) kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - __kvm_start_pvclock_update(kvm); + __kvm_start_pvclock_update(kvm, NULL); } static void kvm_end_pvclock_update(struct kvm *kvm) @@ -3279,22 +3308,25 @@ static void kvm_end_pvclock_update(struct kvm *kvm) struct kvm_vcpu *vcpu; unsigned long i; - write_seqcount_end(&ka->pvclock_sc); - raw_spin_unlock_irq(&ka->tsc_write_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); /* guest entries allowed */ - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); + kvm_clear_mclock_inprogress_request(kvm); + + write_seqcount_end(&ka->pvclock_sc); + raw_spin_unlock_irq(&ka->tsc_write_lock); } -static void kvm_update_masterclock(struct kvm *kvm) +static void kvm_update_masterclock(struct kvm *kvm, struct kvm_vcpu *vcpu) { kvm_hv_request_tsc_page_update(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - kvm_end_pvclock_update(kvm); + kvm_make_mclock_inprogress_request(kvm); + + if (__kvm_start_pvclock_update(kvm, vcpu)) { + pvclock_update_vm_gtod_copy(kvm); + kvm_end_pvclock_update(kvm); + } } /* @@ -11485,8 +11517,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_mmu_free_obsolete_roots(vcpu); if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) __kvm_migrate_timers(vcpu); - if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) - kvm_update_masterclock(vcpu->kvm); + if (kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) + kvm_update_masterclock(vcpu->kvm, vcpu); if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu)) kvm_gen_kvmclock_update(vcpu); if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { -- 2.51.0