Re: [PATCH v17 14/24] KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>
Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	Ira Weiny <ira.weiny@intel.com>, Gavin Shan <gshan@redhat.com>,
	Shivank Garg <shivankg@amd.com>, Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Fuad Tabba <tabba@google.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Tao Chan <chentao@kylinos.cn>,
	James Houghton <jthoughton@google.com>
Subject: Re: [PATCH v17 14/24] KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepages
Date: Wed, 30 Jul 2025 15:33:51 +0800	[thread overview]
Message-ID: <e67b0825-abcb-4bac-bc3c-2e8b513f1d57@intel.com> (raw)
In-Reply-To: <20250729225455.670324-15-seanjc@google.com>

On 7/30/2025 6:54 AM, Sean Christopherson wrote:
> Rework kvm_mmu_max_mapping_level() to provide the plumbing to consult
> guest_memfd (and relevant vendor code) when recovering hugepages, e.g.
> after disabling live migration.  The flaw has existed since guest_memfd was
> originally added, but has gone unnoticed due to lack of guest_memfd support
> for hugepages or dirty logging.
> 
> Don't actually call into guest_memfd at this time, as it's unclear as to
> what the API should be.  Ideally, KVM would simply use kvm_gmem_get_pfn(),
> but invoking kvm_gmem_get_pfn() would lead to sleeping in atomic context
> if guest_memfd needed to allocate memory (mmu_lock is held).  Luckily,
> the path isn't actually reachable, so just add a TODO and WARN to ensure
> the functionality is added alongisde guest_memfd hugepage support, and
> punt the guest_memfd API design question to the future.
> 
> Note, calling kvm_mem_is_private() in the non-fault path is safe, so long
> as mmu_lock is held, as hugepage recovery operates on shadow-present SPTEs,
> i.e. calling kvm_mmu_max_mapping_level() with @fault=NULL is mutually
> exclusive with kvm_vm_set_mem_attributes() changing the PRIVATE attribute
> of the gfn.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c          | 82 +++++++++++++++++++--------------
>   arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
>   3 files changed, 49 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 20dd9f64156e..61eb9f723675 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3302,31 +3302,54 @@ static u8 kvm_max_level_for_order(int order)
>   	return PG_LEVEL_4K;
>   }
>   
> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> -					u8 max_level, int gmem_order)
> +static u8 kvm_max_private_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
> +					const struct kvm_memory_slot *slot, gfn_t gfn)

I don't see why slot and gfn are needed here. Just to keep consistent 
with host_pfn_mapping_level()?

>   {
> -	u8 req_max_level;
> +	u8 max_level, coco_level;
> +	kvm_pfn_t pfn;
>   
> -	if (max_level == PG_LEVEL_4K)
> -		return PG_LEVEL_4K;
> +	/* For faults, use the gmem information that was resolved earlier. */
> +	if (fault) {
> +		pfn = fault->pfn;
> +		max_level = fault->max_level;
> +	} else {
> +		/* TODO: Call into guest_memfd once hugepages are supported. */
> +		WARN_ONCE(1, "Get pfn+order from guest_memfd");
> +		pfn = KVM_PFN_ERR_FAULT;
> +		max_level = PG_LEVEL_4K;
> +	}
>   
> -	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
>   	if (max_level == PG_LEVEL_4K)
> -		return PG_LEVEL_4K;
> +		return max_level;
>   
> -	req_max_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
> -	if (req_max_level)
> -		max_level = min(max_level, req_max_level);
> +	/*
> +	 * CoCo may influence the max mapping level, e.g. due to RMP or S-EPT
> +	 * restrictions.  A return of '0' means "no additional restrictions", to
> +	 * allow for using an optional "ret0" static call.
> +	 */
> +	coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn);
> +	if (coco_level)
> +		max_level = min(max_level, coco_level);
>   
>   	return max_level;
>   }
>   
> -static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> -				       const struct kvm_memory_slot *slot,
> -				       gfn_t gfn, int max_level, bool is_private)
> +int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
> +			      const struct kvm_memory_slot *slot, gfn_t gfn)
>   {
>   	struct kvm_lpage_info *linfo;
> -	int host_level;
> +	int host_level, max_level;
> +	bool is_private;
> +
> +	lockdep_assert_held(&kvm->mmu_lock);
> +
> +	if (fault) {
> +		max_level = fault->max_level;
> +		is_private = fault->is_private;
> +	} else {
> +		max_level = PG_LEVEL_NUM;
> +		is_private = kvm_mem_is_private(kvm, gfn);
> +	}
>   
>   	max_level = min(max_level, max_huge_page_level);
>   	for ( ; max_level > PG_LEVEL_4K; max_level--) {
> @@ -3335,25 +3358,16 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
>   			break;
>   	}
>   
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
> +
>   	if (is_private)
> -		return max_level;
> -
> -	if (max_level == PG_LEVEL_4K)
> -		return PG_LEVEL_4K;
> -
> -	host_level = host_pfn_mapping_level(kvm, gfn, slot);
> +		host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
> +	else
> +		host_level = host_pfn_mapping_level(kvm, gfn, slot);
>   	return min(host_level, max_level);
>   }
>   
> -int kvm_mmu_max_mapping_level(struct kvm *kvm,
> -			      const struct kvm_memory_slot *slot, gfn_t gfn)
> -{
> -	bool is_private = kvm_slot_has_gmem(slot) &&
> -			  kvm_mem_is_private(kvm, gfn);
> -
> -	return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private);
> -}
> -
>   void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   {
>   	struct kvm_memory_slot *slot = fault->slot;
> @@ -3374,9 +3388,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>   	 * Enforce the iTLB multihit workaround after capturing the requested
>   	 * level, which will be used to do precise, accurate accounting.
>   	 */
> -	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
> -						       fault->gfn, fault->max_level,
> -						       fault->is_private);
> +	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, fault,
> +						     fault->slot, fault->gfn);
>   	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
>   		return;
>   
> @@ -4564,8 +4577,7 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
>   	}
>   
>   	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> -	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
> -							 fault->max_level, max_order);
> +	fault->max_level = kvm_max_level_for_order(max_order);
>   
>   	return RET_PF_CONTINUE;
>   }
> @@ -7165,7 +7177,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>   		 * mapping if the indirect sp has level = 1.
>   		 */
>   		if (sp->role.direct &&
> -		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn)) {
> +		    sp->role.level < kvm_mmu_max_mapping_level(kvm, NULL, slot, sp->gfn)) {
>   			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
>   
>   			if (kvm_available_flush_remote_tlbs_range())
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 65f3c89d7c5d..b776be783a2f 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -411,7 +411,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   	return r;
>   }
>   
> -int kvm_mmu_max_mapping_level(struct kvm *kvm,
> +int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
>   			      const struct kvm_memory_slot *slot, gfn_t gfn);
>   void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>   void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7f3d7229b2c1..740cb06accdb 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1813,7 +1813,7 @@ static void recover_huge_pages_range(struct kvm *kvm,
>   		if (iter.gfn < start || iter.gfn >= end)
>   			continue;
>   
> -		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, iter.gfn);
> +		max_mapping_level = kvm_mmu_max_mapping_level(kvm, NULL, slot, iter.gfn);
>   		if (max_mapping_level < iter.level)
>   			continue;
>

next prev parent reply	other threads:[~2025-07-30  7:34 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 22:54 [PATCH v17 00/24] KVM: Enable mmap() for guest_memfd Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 01/24] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 02/24] KVM: x86: Have all vendor neutral sub-configs depend on KVM_X86, not just KVM Sean Christopherson
2025-07-31  8:08   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 03/24] KVM: x86: Select KVM_GENERIC_PRIVATE_MEM directly from KVM_SW_PROTECTED_VM Sean Christopherson
2025-07-31  8:08   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 04/24] KVM: x86: Select TDX's KVM_GENERIC_xxx dependencies iff CONFIG_KVM_INTEL_TDX=y Sean Christopherson
2025-07-31  8:07   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 05/24] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 06/24] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 07/24] KVM: Fix comments that refer to slots_lock Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 08/24] KVM: Fix comment that refers to kvm uapi header path Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 09/24] KVM: x86: Enable KVM_GUEST_MEMFD for all 64-bit builds Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 10/24] KVM: guest_memfd: Add plumbing to host to map guest_memfd pages Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 11/24] KVM: guest_memfd: Track guest_memfd mmap support in memslot Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 12/24] KVM: x86/mmu: Rename .private_max_mapping_level() to .gmem_max_mapping_level() Sean Christopherson
2025-07-31  8:15   ` Fuad Tabba
2025-07-31  8:29     ` David Hildenbrand
2025-07-31  8:33       ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 13/24] KVM: x86/mmu: Hoist guest_memfd max level/order helpers "up" in mmu.c Sean Christopherson
2025-07-31  7:59   ` David Hildenbrand
2025-07-31  8:06   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 14/24] KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepages Sean Christopherson
2025-07-30  7:33   ` Xiaoyao Li [this message]
2025-07-31  8:06     ` David Hildenbrand
2025-07-31  8:10   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 15/24] KVM: x86/mmu: Extend guest_memfd's max mapping level to shared mappings Sean Christopherson
2025-07-30  7:36   ` Xiaoyao Li
2025-07-31  8:01   ` David Hildenbrand
2025-07-31  8:05   ` Fuad Tabba
2025-07-29 22:54 ` [PATCH v17 16/24] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Sean Christopherson
2025-07-30  7:37   ` Xiaoyao Li
2025-07-29 22:54 ` [PATCH v17 17/24] KVM: arm64: Refactor user_mem_abort() Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 18/24] KVM: arm64: Handle guest_memfd-backed guest page faults Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 19/24] KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 20/24] KVM: arm64: Enable support for guest_memfd backed memory Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 21/24] KVM: Allow and advertise support for host mmap() on guest_memfd files Sean Christopherson
2025-07-29 22:54 ` [PATCH v17 22/24] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Sean Christopherson
2025-07-30 11:04   ` Xiaoyao Li
2025-07-29 22:54 ` [PATCH v17 23/24] KVM: selftests: guest_memfd mmap() test when mmap is supported Sean Christopherson
2025-07-30 11:39   ` Xiaoyao Li
2025-07-30 12:57     ` Sean Christopherson
2025-07-31  7:49       ` Xiaoyao Li
2025-08-07  8:12   ` Shivank Garg
2025-07-29 22:54 ` [PATCH v17 24/24] KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory Sean Christopherson
2025-07-30  8:20   ` Xiaoyao Li
2025-07-30 15:51   ` Fuad Tabba
2026-03-30  6:21   ` Zenghui Yu
2026-04-17 16:47     ` Sean Christopherson
2026-05-12  7:28       ` Zenghui Yu
2026-05-12 15:53         ` Sean Christopherson
2025-07-30 21:34 ` [PATCH v17 00/24] KVM: Enable mmap() for guest_memfd Ackerley Tng
2025-07-30 22:44   ` Ackerley Tng
2025-08-27  8:43 ` Paolo Bonzini
2025-08-27 12:57   ` Sean Christopherson
2025-08-27 13:08   ` Marc Zyngier
2025-08-27 13:11     ` Paolo Bonzini
2025-08-27 13:14       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e67b0825-abcb-4bac-bc3c-2e8b513f1d57@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=ackerleytng@google.com \
    --cc=chentao@kylinos.cn \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shivankg@amd.com \
    --cc=tabba@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.