Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Sean Christopherson" <seanjc@google.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time
Date: Sat, 30 May 2026 06:19:32 +0000	[thread overview]
Message-ID: <20260530061932.7849D1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260529165114.748639-21-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Calling sleepable operations while holding an rwlock (gpc->lock) in record_steal_time() causes 'scheduling while atomic' bugs.
- [High] Calling read_unlock() on an RT rwlock from sched_out() triggers wake_up_process() while rq->lock is held, causing an ABBA deadlock.
--

commit cb799f9970f13f02eaea4c9d6c87fa7537434bcf
Author: Carsten Stollmaier <stollmc@amazon.com>
KVM: x86: Use gfn_to_pfn_cache for record_steal_time

This commit reverts to using the gfn_to_pfn_cache for guest steal time and
preempted status updates to avoid mmap_lock contention with userfaultfd. It
replaces unsafe_put_user accesses with plain C reads and writes. It also
uses a read_trylock in the scheduler path to avoid sleeping under PREEMPT_RT.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3746,42 +3744,20 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
[ ... ]
> +	CLASS(gpc_map_local, st_map)(gpc, sizeof(*st));
> +	if (IS_ERR(st_map))
> +		return;
>  
> -	st = (struct kvm_steal_time __user *)ghc->hva;
>  	/*
>  	 * Doing a TLB flush here, on the guest's behalf, can avoid
>  	 * expensive IPIs.
>  	 */
>  	if (guest_pv_has(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) {
[ ... ]
> +		u8 st_preempted = xchg(&st->preempted, 0);
>  
>  		vcpu->arch.st.preempted = 0;
>  
> @@ -3789,39 +3765,30 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  				       st_preempted & KVM_VCPU_FLUSH_TLB);
>  		if (st_preempted & KVM_VCPU_FLUSH_TLB)
>  			kvm_vcpu_flush_tlb_guest(vcpu);

[Severity: High]
Does this introduce a scheduling while atomic bug on non-PREEMPT_RT kernels?

The CLASS(gpc_map_local, st_map) macro acquires a read_lock on gpc->lock,
which disables preemption. While this lock is held, if the guest supports
PV TLB flush, the code calls kvm_vcpu_flush_tlb_guest().

If TDP is disabled (shadow paging), kvm_vcpu_flush_tlb_guest() calls
kvm_mmu_sync_roots() and eventually mmu_sync_children(). This path can yield
via cond_resched_rwlock_write(). Yielding while preemption is disabled by
the gpc read lock will trigger a BUG.

The old code safely dropped the user access guard before executing the
TLB flush, avoiding this issue.

[ ... ]
> @@ -5260,20 +5227,32 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
[ ... ]
> -	st = (struct kvm_steal_time __user *)ghc->hva;
> -	BUILD_BUG_ON(sizeof(st->preempted) != sizeof(preempted));
> -
> -	if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
> -		vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
> +	/*
> +	 * Use a trylock as this is called from the scheduler path (via
> +	 * kvm_sched_out), where rwlock_t is not safe on PREEMPT_RT (it
> +	 * becomes sleepable).  Setting preempted is best-effort anyway;
> +	 * the old HVA-based code used copy_to_user_nofault() which could
> +	 * also silently fail.
> +	 *
> +	 * Since we only trylock and bail on failure, there is no risk of
> +	 * deadlock with an interrupt handler, so no need to disable
> +	 * interrupts.
> +	 */
> +	CLASS(gpc_try_map_local, st_map)(gpc, sizeof(st->preempted));
> +	if (IS_ERR(st_map))
>  		return;
>  
> -	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
> +	st = *st_map;
> +	WRITE_ONCE(st->preempted, KVM_VCPU_PREEMPTED);
> +	vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
> +}

[Severity: High]
Can this cause an ABBA deadlock in the scheduler on PREEMPT_RT?

kvm_steal_time_set_preempted() is called during a vCPU context switch
via kvm_sched_out(), which is a preempt notifier hook. This runs with
the local CPUs runqueue lock (rq->lock) held.

While the code correctly uses a trylock to avoid sleeping when acquiring
the lock, the CLASS macro destructor will implicitly call read_unlock()
at the end of the function. On PREEMPT_RT kernels, unlocking an rt_rwlock
can wake up a waiting writer, such as a concurrent kvm_gpc_refresh().

If a writer is woken, rt_mutex_wake_up_q() calls wake_up_process(), which
attempts to acquire the target tasks rq->lock via try_to_wake_up().
Attempting to acquire a runqueue lock while the local CPUs rq->lock is
already held violates scheduler lock ordering and leads to a deadlock.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529165114.748639-1-seanjc@google.com?part=20

next prev parent reply	other threads:[~2026-05-30  6:19 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 16:50 [PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t Sean Christopherson
2026-05-29 16:50 ` [PATCH v2 01/20] locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock() Sean Christopherson
2026-05-29 19:32   ` Peter Zijlstra
2026-05-29 19:34     ` Peter Zijlstra
2026-05-29 20:05       ` Sean Christopherson
2026-05-29 20:13         ` Peter Zijlstra
2026-05-29 20:38           ` Peter Zijlstra
2026-05-30  0:54             ` Sean Christopherson
2026-05-30 10:26               ` Paolo Bonzini
2026-05-30 12:47                 ` David Woodhouse
2026-05-30 14:40                   ` Paolo Bonzini
2026-06-01 10:52                     ` David Woodhouse
2026-06-01 13:01                       ` David Woodhouse
2026-06-01 13:40                         ` Sebastian Andrzej Siewior
2026-06-01 13:53                           ` David Woodhouse
2026-06-01 14:47                             ` Sebastian Andrzej Siewior
2026-06-01 15:11                               ` David Woodhouse
2026-06-01  9:40                   ` Peter Zijlstra
2026-06-01 10:04                     ` David Woodhouse
2026-05-30 13:02                 ` Paolo Bonzini
2026-06-01  8:40                 ` Peter Zijlstra
2026-06-01 11:11                   ` Sebastian Andrzej Siewior
2026-06-01 11:40                     ` Peter Zijlstra
2026-06-01 19:13                     ` Paolo Bonzini
2026-06-02  7:34                       ` Sebastian Andrzej Siewior
2026-06-04 23:58             ` David Woodhouse
2026-05-29 16:50 ` [PATCH v2 02/20] KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths Sean Christopherson
2026-05-29 17:20   ` sashiko-bot
2026-05-29 23:28   ` Hillf Danton
2026-05-29 16:50 ` [PATCH v2 03/20] KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c Sean Christopherson
2026-05-29 17:36   ` sashiko-bot
2026-05-29 16:50 ` [PATCH v2 04/20] KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock() Sean Christopherson
2026-05-29 16:50 ` [PATCH v2 05/20] KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 06/20] KVM: x86/xen: Use guard() to grab kvm->srcu around gpc critical sections Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 07/20] KVM: x86/xen: Extract delivery of event to vCPU into a separate helper Sean Christopherson
2026-05-29 17:47   ` sashiko-bot
2026-05-29 16:51 ` [PATCH v2 08/20] KVM: x86/xen: Explicitly tag "shared info" page as never being dirty tracked Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 09/20] KVM: x86/xen: Don't dirty track "vCPU info" page Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 10/20] KVM: Move {g,p}fn <=> {g,h}pa conversion helpers to kvm_types.h Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 11/20] KVM: Add CLASS() constructs to automagically handle lock+check of gpc Sean Christopherson
2026-05-29 17:59   ` sashiko-bot
2026-05-29 16:51 ` [PATCH v2 12/20] KVM: x86/xen: Convert kvm_xen_shared_info_init() to gpc's CLASS() APIs Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 13/20] KVM: x86/xen: Don't bother waiting on gpc->lock in SCHEDOP_poll Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 14/20] KVM: x86/xen: Convert wait_pending_event() to gpc's CLASS() APIs Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 15/20] KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() " Sean Christopherson
2026-05-29 19:01   ` sashiko-bot
2026-05-29 19:11     ` Sean Christopherson
2026-06-02 12:37       ` David Woodhouse
2026-05-29 16:51 ` [PATCH v2 16/20] KVM: x86/xen: Convert xen_get_guest_pvclock() " Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 17/20] KVM: x86/xen: Drop local "kick_vcpu" from __kvm_xen_set_evtchn_fast() Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 18/20] KVM: x86/xen: Convert event injection to gpc's CLASS() APIs Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 19/20] KVM: Add "extended" gpc CLASS() APIs for sometimes-atomic cases Sean Christopherson
2026-05-29 16:51 ` [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time Sean Christopherson
2026-05-30  6:19   ` sashiko-bot [this message]
2026-06-02 12:29     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260530061932.7849D1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.