From: Binbin Wu <binbin.wu@linux.intel.com>
To: Yan Zhao <yan.y.zhao@intel.com>, seanjc@google.com
Cc: pbonzini@redhat.com, reinette.chatre@intel.com,
rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH v2 1/3] KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault
Date: Tue, 9 Sep 2025 10:46:31 +0800 [thread overview]
Message-ID: <ea1603bc-68f2-44cd-8cdf-ec5969486dea@linux.intel.com> (raw)
In-Reply-To: <20250822070347.26451-1-yan.y.zhao@intel.com>
On 8/22/2025 3:03 PM, Yan Zhao wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Return -EAGAIN if userspace attempts to delete or move a memslot while also
> prefaulting memory for that same memslot, i.e. force userspace to retry
> instead of trying to handle the scenario entirely within KVM. Unlike
> KVM_RUN, which needs to handle the scenario entirely within KVM because
> userspace has come to depend on such behavior, KVM_PRE_FAULT_MEMORY can
> return -EAGAIN without breaking userspace as this scenario can't have ever
> worked (and there's no sane use case for prefaulting to a memslot that's
> being deleted/moved).
>
> And also unlike KVM_RUN, the prefault path doesn't naturally gaurantee
gaurantee -> guarantee
> forward progress. E.g. to handle such a scenario, KVM would need to drop
> and reacquire SRCU to break the deadlock between the memslot update
> (synchronizes SRCU) and the prefault (waits for the memslot update to
> complete).
>
> However, dropping SRCU creates more problems, as completing the memslot
> update will bump the memslot generation, which in turn will invalidate the
> MMU root. To handle that, prefaulting would need to handle pending
> KVM_REQ_MMU_FREE_OBSOLETE_ROOTS requests and do kvm_mmu_reload() prior to
> mapping each individual.
>
> I.e. to fully handle this scenario, prefaulting would eventually need to
> look a lot like vcpu_enter_guest(). Given that there's no reasonable use
> case and practically zero risk of breaking userspace, punt the problem to
> userspace and avoid adding unnecessary complexity to the prefualt path.
prefualt -> prefault
>
> Note, TDX's guest_memfd post-populate path is unaffected as slots_lock is
> held for the entire duration of populate(), i.e. any memslot modifications
> will be fully serialized against TDX's flavor of prefaulting.
>
> Reported-by: Reinette Chatre <reinette.chatre@intel.com>
> Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com
> Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Two typos above.
Otherwise,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 92ff15969a36..f31fad33c423 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4653,10 +4653,16 @@ static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
> /*
> * Retry the page fault if the gfn hit a memslot that is being deleted
> * or moved. This ensures any existing SPTEs for the old memslot will
> - * be zapped before KVM inserts a new MMIO SPTE for the gfn.
> + * be zapped before KVM inserts a new MMIO SPTE for the gfn. Punt the
> + * error to userspace if this is a prefault, as KVM's prefaulting ABI
> + * doesn't need provide the same forward progress guarantees as KVM_RUN.
> */
> - if (slot->flags & KVM_MEMSLOT_INVALID)
> + if (slot->flags & KVM_MEMSLOT_INVALID) {
> + if (fault->prefetch)
> + return -EAGAIN;
> +
> return RET_PF_RETRY;
> + }
>
> if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
> /*
next prev parent reply other threads:[~2025-09-09 2:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-22 7:03 [PATCH v2 0/3] KVM: Fix deadlock for invalid memslots Yan Zhao
2025-08-22 7:03 ` [PATCH v2 1/3] KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault Yan Zhao
2025-09-09 2:46 ` Binbin Wu [this message]
2025-08-22 7:05 ` [PATCH v2 2/3] KVM: TDX: Do not retry locally when the retry is caused by invalid memslot Yan Zhao
2025-09-09 3:29 ` Binbin Wu
2025-09-09 14:18 ` Sean Christopherson
2025-09-10 2:02 ` Binbin Wu
2025-08-22 7:05 ` [PATCH v2 3/3] KVM: selftests: Test prefault memory during concurrent memslot removal Yan Zhao
2025-09-08 23:47 ` Sean Christopherson
2025-09-15 8:21 ` Yan Zhao
2025-09-16 0:14 ` Sean Christopherson
2025-09-24 17:09 ` Sean Christopherson
2025-09-16 0:25 ` [PATCH v2 0/3] KVM: Fix deadlock for invalid memslots Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea1603bc-68f2-44cd-8cdf-ec5969486dea@linux.intel.com \
--to=binbin.wu@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.