* Re: [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs [not found] <ahnk6cBLuuPIAwmA@google.com> @ 2026-06-02 12:37 ` David Woodhouse 2026-06-09 0:53 ` Sean Christopherson 0 siblings, 1 reply; 3+ messages in thread From: David Woodhouse @ 2026-06-02 12:37 UTC (permalink / raw) To: seanjc Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel, sashiko-reviews, dwmw [-- Attachment #1: Type: text/plain, Size: 1464 bytes --] On Fri, 29 May 2026 12:11:37 -0700, Sean Christopherson wrote: > On Fri, May 29, 2026, sashiko-bot@kernel.org wrote: > > [Severity: High] > > Does this unintentionally hold the gpc lock across the IPI kick? > > No, it intentionally holds the gpc lock across _sending_ the IPI kick. > > > [Severity: High] > > Does this now hold the VM-wide shinfo_cache lock while calling > > __kvm_xen_set_evtchn_fast() and kicking the vCPU? > > > > Since shinfo_map is a function-scoped CLASS() variable, its destructor > > won't release the lock until after __kvm_xen_set_evtchn_fast() returns. > > This creates a nested locking dependency and holds locks over expensive > > cross-vCPU operations, potentially serializing event channel deliveries > > across the entire VM on the fast path. > > __kvm_vcpu_kick() is neither expensive nor cross-vCPU. In the wait=false case, > which is the behavior of kvm_vcpu_kick(), it sends IPIs via smp_send_reschedule(), > i.e. it's more or less just __apic_send_IPI(cpu, RESCHEDULE_VECTOR), which is a > single WRMSR on modern harware. Hm, if we really are going to use a raw rwlock then I suppose we should take care not to spend any extra cycles with the lock held unless we actually *need* to? In which case I wonder if we really need the scoped CLASS() thing (that encourages people to run to the end of the function under the lock), or if the helpers you add for that are sufficient on their own? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5069 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs 2026-06-02 12:37 ` [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs David Woodhouse @ 2026-06-09 0:53 ` Sean Christopherson 0 siblings, 0 replies; 3+ messages in thread From: Sean Christopherson @ 2026-06-09 0:53 UTC (permalink / raw) To: David Woodhouse Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, hpa, kvm, linux-kernel, sashiko-reviews, dwmw On Tue, Jun 02, 2026, David Woodhouse wrote: > On Fri, 29 May 2026 12:11:37 -0700, Sean Christopherson wrote: > > On Fri, May 29, 2026, sashiko-bot@kernel.org wrote: > > > [Severity: High] > > > Does this unintentionally hold the gpc lock across the IPI kick? > > > > No, it intentionally holds the gpc lock across _sending_ the IPI kick. > > > > > [Severity: High] > > > Does this now hold the VM-wide shinfo_cache lock while calling > > > __kvm_xen_set_evtchn_fast() and kicking the vCPU? > > > > > > Since shinfo_map is a function-scoped CLASS() variable, its destructor > > > won't release the lock until after __kvm_xen_set_evtchn_fast() returns. > > > This creates a nested locking dependency and holds locks over expensive > > > cross-vCPU operations, potentially serializing event channel deliveries > > > across the entire VM on the fast path. > > > > __kvm_vcpu_kick() is neither expensive nor cross-vCPU. In the wait=false case, > > which is the behavior of kvm_vcpu_kick(), it sends IPIs via smp_send_reschedule(), > > i.e. it's more or less just __apic_send_IPI(cpu, RESCHEDULE_VECTOR), which is a > > single WRMSR on modern harware. > > Hm, if we really are going to use a raw rwlock then I suppose we should > take care not to spend any extra cycles with the lock held unless we > actually *need* to? > > In which case I wonder if we really need the scoped CLASS() thing (that > encourages people to run to the end of the function under the lock), or > if the helpers you add for that are sufficient on their own? Eh, I would argue that "the code is easier to read" is sufficient justification so long as holding the lock for a few tens of extra cycles doesn't cause problems in practice. ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t
@ 2026-05-29 16:50 Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs Sean Christopherson
0 siblings, 1 reply; 3+ messages in thread
From: Sean Christopherson @ 2026-05-29 16:50 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
Cc: Waiman Long, kvm, linux-kernel, David Woodhouse,
Sebastian Andrzej Siewior, syzbot+208f7f3e5f59c11aeb90,
Carsten Stollmaier
This series fixes sleeping-in-hardirq bugs in KVM's Xen emulation on
PREEMPT_RT, cleans up the now-unnecessary IRQ disabling in GPC lock usage
throughout KVM, and then adds CLASS()-based APIs for utilizing GPCs mappings
to dedup code and (hopefully) make it easier to use GPCs in other places.
The core issue is that kvm_xen_set_evtchn_fast() and the Xen timer
callback are called from hardirq/atomic context, but on PREEMPT_RT the
GPC rwlock_t is a sleeping lock.
Assuming I can get an Ack on patch 1, I'm planning on grabbing at least these
KVM: Move {g,p}fn <=> {g,h}pa conversion helpers to kvm_types.h
KVM: x86/xen: Don't dirty track "vCPU info" page
KVM: x86/xen: Explicitly tag "shared info" page as never being dirty tracked
KVM: x86/xen: Extract delivery of event to vCPU into a separate helper
KVM: x86/xen: Use guard() to grab kvm->srcu around gpc critical sections
KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c
KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock()
KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c
KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths
locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()
for 7.2. If people like the CLASS() stuff, I'll also probably grab these:
KVM: Add "extended" gpc CLASS() APIs for sometimes-atomic cases
KVM: x86/xen: Convert event injection to gpc's CLASS() APIs
KVM: x86/xen: Drop local "kick_vcpu" from __kvm_xen_set_evtchn_fast()
KVM: x86/xen: Convert xen_get_guest_pvclock() to gpc's CLASS() APIs
KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs
KVM: x86/xen: Convert wait_pending_event() to gpc's CLASS() APIs
KVM: x86/xen: Don't bother waiting on gpc->lock in SCHEDOP_poll
KVM: x86/xen: Convert kvm_xen_shared_info_init() to gpc's CLASS() APIs
KVM: Add CLASS() constructs to automagically handle lock+check of gpc
I do NOT plan on grabbing the record_steal_time change for 7.2 no matter
what, even though I do like the end result, as I still have concerns over the
lack of range-based invalidation for GPCs. I 100% agree that such problems
are really only due to flawed VMMs and/or setups, but unfortunately history
has shown that there are a suprising number of deployments running what I
would consider flawed setups, e.g. run with NUMA autobalancing and KSM.
I realize I'm being somewhat paranoid, as KVM already uses a GPC for PV
clocks. But for modern setups, KVM_REQ_CLOCK_UPDATE is a rare event, whereas
KVM will update steal time (when enabled) on every vCPU load. So I want a
high level of confidence that KVM won't regress "imperfect" setups before
switching to a GPC for steal time (though again, I definitely like the end
result and want to do so).
[*] https://lore.kernel.org/all/20240821202814.711673-2-dwmw2@infradead.org
v2:
- Add the CLASS() APIs.
- Move the steal time change to the very end.
- "Fix" a dirty logging inconsistency with the Xen vCPU info page.
v1: https://lore.kernel.org/all/20260508181717.3230988-1-dwmw2@infradead.org
Carsten Stollmaier (1):
KVM: x86: Use gfn_to_pfn_cache for record_steal_time
David Woodhouse (5):
locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()
KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths
KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c
KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock()
KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c
Sean Christopherson (14):
KVM: x86/xen: Use guard() to grab kvm->srcu around gpc critical
sections
KVM: x86/xen: Extract delivery of event to vCPU into a separate helper
KVM: x86/xen: Explicitly tag "shared info" page as never being dirty
tracked
KVM: x86/xen: Don't dirty track "vCPU info" page
KVM: Move {g,p}fn <=> {g,h}pa conversion helpers to kvm_types.h
KVM: Add CLASS() constructs to automagically handle lock+check of gpc
KVM: x86/xen: Convert kvm_xen_shared_info_init() to gpc's CLASS() APIs
KVM: x86/xen: Don't bother waiting on gpc->lock in SCHEDOP_poll
KVM: x86/xen: Convert wait_pending_event() to gpc's CLASS() APIs
KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs
KVM: x86/xen: Convert xen_get_guest_pvclock() to gpc's CLASS() APIs
KVM: x86/xen: Drop local "kick_vcpu" from __kvm_xen_set_evtchn_fast()
KVM: x86/xen: Convert event injection to gpc's CLASS() APIs
KVM: Add "extended" gpc CLASS() APIs for sometimes-atomic cases
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/x86.c | 140 +++++++---------
arch/x86/kvm/xen.c | 288 +++++++++++++-------------------
include/linux/kvm_host.h | 84 +++++++---
include/linux/kvm_types.h | 17 ++
kernel/locking/rwbase_rt.c | 5 +-
virt/kvm/pfncache.c | 68 ++++++--
7 files changed, 304 insertions(+), 300 deletions(-)
base-commit: d1568b1332b6b3b36b222c2868fc102727c12a34
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs 2026-05-29 16:50 [PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t Sean Christopherson @ 2026-05-29 16:51 ` Sean Christopherson 0 siblings, 0 replies; 3+ messages in thread From: Sean Christopherson @ 2026-05-29 16:51 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng Cc: Waiman Long, kvm, linux-kernel, David Woodhouse, Sebastian Andrzej Siewior, syzbot+208f7f3e5f59c11aeb90, Carsten Stollmaier Convert the event channel fastpath to the "map local" CLASS() APIs, using the "try" variants as the faspath can't block. Note! The vcpu_info mapping is read/write, even though there is no existing call to mark the page dirty. Like Xen's shared info page, the vCPU info page is assumed to be dirty at all times, and so isn't marked dirty after every write. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/xen.c | 42 ++++++++++++++++-------------------------- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 8f822acb11a4..47750316f132 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -1774,29 +1774,28 @@ static void __kvm_xen_set_evtchn_fast(struct kvm_vcpu *vcpu, int port_word_bit) bool kick_vcpu = false; /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ - if (!read_trylock(&gpc->lock)) { - /* - * Could not access the vcpu_info. Set the bit in-kernel and - * prod the vCPU to deliver it for itself. - */ + CLASS(gpc_try_map_local, vcpu_info_map)(gpc, sizeof(struct vcpu_info)); + + /* + * If the vcpu_info is inaccessible, set the bit in-kernel and prod the + * vCPU to deliver it for itself. + */ + if (IS_ERR(vcpu_info_map)) { if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel)) kick_vcpu = true; goto out_kick; } - if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { - if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel)) - kick_vcpu = true; - goto out_unlock; - } if (IS_ENABLED(CONFIG_64BIT) && vcpu->kvm->arch.xen.long_mode) { - struct vcpu_info *vcpu_info = gpc->khva; + struct vcpu_info *vcpu_info = *vcpu_info_map; + if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) { WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); kick_vcpu = true; } } else { - struct compat_vcpu_info *vcpu_info = gpc->khva; + struct compat_vcpu_info *vcpu_info = *vcpu_info_map; + if (!test_and_set_bit(port_word_bit, (unsigned long *)&vcpu_info->evtchn_pending_sel)) { WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); @@ -1810,8 +1809,6 @@ static void __kvm_xen_set_evtchn_fast(struct kvm_vcpu *vcpu, int port_word_bit) kick_vcpu = false; } -out_unlock: - read_unlock(&gpc->lock); out_kick: if (kick_vcpu) { kvm_make_request(KVM_REQ_UNBLOCK, vcpu); @@ -1850,23 +1847,19 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) if (xe->port >= max_evtchn_port(kvm)) return -EINVAL; - rc = -EWOULDBLOCK; - guard(srcu)(&kvm->srcu); - if (!read_trylock(&gpc->lock)) - return rc; - - if (!kvm_gpc_check(gpc, PAGE_SIZE)) - goto out_unlock; + CLASS(gpc_try_map_local, shinfo_map)(gpc, PAGE_SIZE); + if (IS_ERR(shinfo_map)) + return -EWOULDBLOCK; if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { - struct shared_info *shinfo = gpc->khva; + struct shared_info *shinfo = *shinfo_map; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; port_word_bit = xe->port / 64; } else { - struct compat_shared_info *shinfo = gpc->khva; + struct compat_shared_info *shinfo = *shinfo_map; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; port_word_bit = xe->port / 32; @@ -1888,9 +1881,6 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) rc = 1; /* Delivered to the bitmap in shared_info. */ } -out_unlock: - read_unlock(&gpc->lock); - if (rc == 1) __kvm_xen_set_evtchn_fast(vcpu, port_word_bit); return rc; -- 2.54.0.823.g6e5bcc1fc9-goog ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-09 0:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <ahnk6cBLuuPIAwmA@google.com>
2026-06-02 12:37 ` [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs David Woodhouse
2026-06-09 0:53 ` Sean Christopherson
2026-05-29 16:50 [PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox