Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: sashiko-bot@kernel.org, pbonzini@redhat.com, tglx@kernel.org,
	 mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org,  hpa@zytor.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  sashiko-reviews@lists.linux.dev,
	stollmc@amazon.com, dwmw@amazon.co.uk
Subject: Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time
Date: Mon, 8 Jun 2026 17:45:45 -0700	[thread overview]
Message-ID: <aidiOemcna1Un9wG@google.com> (raw)
In-Reply-To: <3291d47ea5fdd6ee2f284cadb518473130954cfc.camel@infradead.org>

On Tue, Jun 02, 2026, David Woodhouse wrote:
> On Sat, 30 May 2026 06:19:32 +0000, sashiko-bot@kernel.org wrote:
> > [Severity: High]
> > Does this introduce a scheduling while atomic bug on non-PREEMPT_RT kernels?
> >
> > The CLASS(gpc_map_local, st_map) macro acquires a read_lock on gpc->lock,
> > which disables preemption. While this lock is held, if the guest supports
> > PV TLB flush, the code calls kvm_vcpu_flush_tlb_guest().
> >
> > If TDP is disabled (shadow paging), kvm_vcpu_flush_tlb_guest() calls
> > kvm_mmu_sync_roots() and eventually mmu_sync_children(). This path can yield
> > via cond_resched_rwlock_write(). Yielding while preemption is disabled by
> > the gpc read lock will trigger a BUG.
> 
> Ah, that issue exists in the previous versions too, but it's simple
> enough to fix. There's no particular timing constraint for flushing the
> TLB; it just have to be done before this vCPU ever runs again. It can
> just be moved to the end of the function after the lock is dropped.
> 
> That does mean record_steal_time() should use the explicit
> gpc_map_local_lock()/gpc_map_local_unlock() instead of the CLASS()
> macro, but that's easy enough.

Actually, we use KVM_REQ_TLB_FLUSH_GUEST and "optimize" the code for the rare
case where KVM already have a TLB flushed queued for the vCPU.  E.g. over two
patches (so that changing the order of the request processing is isolated):

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b27dd9ba0aa..48234eeb246b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3764,7 +3764,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
                trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
                                       st_preempted & KVM_VCPU_FLUSH_TLB);
                if (st_preempted & KVM_VCPU_FLUSH_TLB)
-                       kvm_vcpu_flush_tlb_guest(vcpu);
+                       kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
        } else {
                WRITE_ONCE(st->preempted, 0);
                vcpu->arch.st.preempted = 0;
@@ -11165,6 +11165,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
                        if (unlikely(r))
                                goto out;
                }
+               if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
+                       record_steal_time(vcpu);
                if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
                        kvm_mmu_sync_roots(vcpu);
                if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
@@ -11214,8 +11216,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
                        r = 1;
                        goto out;
                }
-               if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
-                       record_steal_time(vcpu);
                if (kvm_check_request(KVM_REQ_PMU, vcpu))
                        kvm_pmu_handle_event(vcpu);
                if (kvm_check_request(KVM_REQ_PMI, vcpu))


KVM needs to ensure the RMW on st->preempted is atomic, to avoid re-introducing
the bug fixed by commit b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag
is not missed"), but AFAICT there's nothing that requires to complete the TLB
flush before bumping the version, KVM just needs to service the flush before
entering the guest on that vCPU.

next prev parent reply	other threads:[~2026-06-09  0:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260530061932.7849D1F00893@smtp.kernel.org>
2026-06-02 12:29 ` [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time David Woodhouse
2026-06-09  0:45   ` Sean Christopherson [this message]
2026-05-29 16:50 [PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time Sean Christopherson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1b27dd9ba0a dfblob:48234eeb246 )
 OR (
bs:"Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aidiOemcna1Un9wG@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sashiko-bot@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=stollmc@amazon.com \
    --cc=tglx@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox