public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>, pbonzini@redhat.com, seanjc@google.com
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	x86@kernel.org,  rick.p.edgecombe@intel.com,
	dave.hansen@intel.com, kas@kernel.org,  tabba@google.com,
	quic_eberman@quicinc.com, michael.roth@amd.com,
	 david@redhat.com, vannapurve@google.com, vbabka@suse.cz,
	 thomas.lendacky@amd.com, pgonda@google.com,
	zhiquan1.li@intel.com,  fan.du@intel.com, jun.miao@intel.com,
	ira.weiny@intel.com,  isaku.yamahata@intel.com,
	xiaoyao.li@intel.com, binbin.wu@linux.intel.com,
	 chao.p.peng@intel.com, yan.y.zhao@intel.com
Subject: Re: [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion
Date: Wed, 01 Oct 2025 06:21:47 +0000	[thread overview]
Message-ID: <diqzh5wj2lc4.fsf@google.com> (raw)
In-Reply-To: <20250807094503.4691-1-yan.y.zhao@intel.com>

Yan Zhao <yan.y.zhao@intel.com> writes:

Thanks Yan! Just got around to looking at this, sorry about the delay!

> In TDX, private page tables require precise zapping because faulting back
> the zapped mappings necessitates the guest's re-acceptance. Therefore,
> before performing a zap for hole punching and private-to-shared
> conversions, huge leafs that cross the boundary of the zapping GFN range in
> the mirror page table must be split.
>
> Splitting may result in an error. If this happens, hole punching and
> private-to-shared conversion should bail out early and return an error to
> userspace.
>
> Splitting is not necessary for kvm_gmem_release() since the entire page
> table is being zapped, nor for kvm_gmem_error_folio() as an SPTE must not
> map more than one physical folio.
>
> Therefore, in this patch,
> - break kvm_gmem_invalidate_begin_and_zap() into
>   kvm_gmem_invalidate_begin() and kvm_gmem_zap() and have
>   kvm_gmem_release() and kvm_gmem_error_folio() to invoke them.
>

I think perhaps separating invalidate and zip could be a separate patch
from adding the split step into the flow, that would make this patch
smaller and easier to review.

No action required from you for now, I have the the above part in a
separate patch already (not yet posted).

> - have kvm_gmem_punch_hole() to invoke kvm_gmem_invalidate_begin(),
>   kvm_gmem_split_private(), and kvm_gmem_zap().
>   Bail out if kvm_gmem_split_private() returns error.
>

IIUC the current upstream position is that hole punching will not
be permitted for ranges smaller than the page size for the entire
guest_memfd.

Hence no splitting required during hole punch?

+ 4K guest_memfd: no splitting required since the EPT entries will not
  be larger than 4K anyway
+ 2M and 1G (x86) guest_memfd: no splitting required since the entire
  EPT entry will have to go away for valid ranges (valid ranges are
  either 2M or 1G anyway)

Does that sound right?

> - drop the old kvm_gmem_unmap_private() and have private-to-shared
>   conversion to invoke kvm_gmem_split_private() and kvm_gmem_zap() instead.
>   Bail out if kvm_gmem_split_private() returns error.
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> 
> [...snip...]
> 
> @@ -514,6 +554,8 @@ static int kvm_gmem_convert_should_proceed(struct inode *inode,
>  					   struct conversion_work *work,
>  					   bool to_shared, pgoff_t *error_index)
>  {
> +	int ret = 0;
> +
>  	if (to_shared) {
>  		struct list_head *gmem_list;
>  		struct kvm_gmem *gmem;
> @@ -522,19 +564,24 @@ static int kvm_gmem_convert_should_proceed(struct inode *inode,
>  		work_end = work->start + work->nr_pages;
>  
>  		gmem_list = &inode->i_mapping->i_private_list;
> +		list_for_each_entry(gmem, gmem_list, entry) {
> +			ret = kvm_gmem_split_private(gmem, work->start, work_end);
> +			if (ret)
> +				return ret;
> +		}

Will be refactoring the conversion steps a little for the next version
of this series, hence I'd like to ask about the requirements before
doing splitting.

The requirement is to split before zapping, right? Other than that
we technically don't need to split before checking for a safe refcount, right?

>  		list_for_each_entry(gmem, gmem_list, entry)
> -			kvm_gmem_unmap_private(gmem, work->start, work_end);
> +			kvm_gmem_zap(gmem, work->start, work_end, KVM_FILTER_PRIVATE);
>  	} else {
>  		unmap_mapping_pages(inode->i_mapping, work->start,
>  				    work->nr_pages, false);
>  
>  		if (!kvm_gmem_has_safe_refcount(inode->i_mapping, work->start,
>  						work->nr_pages, error_index)) {
> -			return -EAGAIN;
> +			ret = -EAGAIN;
>  		}
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
> 
> [...snip...]
> 
> @@ -1906,8 +1926,14 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  	start = folio->index;
>  	end = start + folio_nr_pages(folio);
>  
> -	list_for_each_entry(gmem, gmem_list, entry)
> -		kvm_gmem_invalidate_begin_and_zap(gmem, start, end);
> +	/* The size of the SEPT will not exceed the size of the folio */

I think splitting might be required here, but that depends on whether we
want to unmap just a part of the huge folio or whether we want to unmap
the entire folio.

Lots of open questions on memory failure handling, but for now I think
this makes sense.

> +	list_for_each_entry(gmem, gmem_list, entry) {
> +		enum kvm_gfn_range_filter filter;
> +
> +		kvm_gmem_invalidate_begin(gmem, start, end);
> +		filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED;
> +		kvm_gmem_zap(gmem, start, end, filter);
> +	}
>  
>  	/*
>  	 * Do not truncate the range, what action is taken in response to the
> -- 
> 2.43.2

  parent reply	other threads:[~2025-10-01  6:21 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  9:39 [RFC PATCH v2 00/23] KVM: TDX huge page support for private memory Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 01/23] x86/tdx: Enhance tdh_mem_page_aug() to support huge pages Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 02/23] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_page_demote() Yan Zhao
2025-09-01  8:55   ` Binbin Wu
2025-09-01  9:08     ` Yan Zhao
2025-09-02 16:56       ` Edgecombe, Rick P
2025-09-02 17:37         ` Sean Christopherson
2025-09-02 17:45           ` Edgecombe, Rick P
2025-09-04  9:31             ` Yan Zhao
2025-11-11  9:15       ` Huang, Kai
2025-11-12  8:06         ` Yan Zhao
2025-11-14  9:14           ` Binbin Wu
2025-11-14  9:21             ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 03/23] x86/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages Yan Zhao
2025-11-11  9:23   ` Huang, Kai
2025-11-12  8:43     ` Yan Zhao
2025-11-12 10:29       ` Huang, Kai
2025-11-13  2:35         ` Yan Zhao
2025-11-13  7:37           ` Huang, Kai
2025-11-13  9:03             ` Yan Zhao
2025-11-13 15:26             ` Dave Hansen
2025-11-14  1:21               ` Yan Zhao
2025-12-10  1:14   ` Vishal Annapurve
2025-12-10  1:18     ` Yan Zhao
2025-12-10  1:30       ` Vishal Annapurve
2025-12-10  1:55         ` Yan Zhao
2025-12-31 19:37           ` Vishal Annapurve
2026-01-06 10:37             ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 04/23] KVM: TDX: Introduce tdx_clear_folio() to clear " Yan Zhao
2025-09-02  2:56   ` Binbin Wu
2025-09-03  9:51     ` Yan Zhao
2025-09-03 11:19       ` Binbin Wu
2025-09-04  2:53         ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 05/23] x86/tdx: Enhance tdh_phymem_page_reclaim() to support " Yan Zhao
2025-11-17  2:09   ` Binbin Wu
2025-11-17  4:05     ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 06/23] KVM: TDX: Do not hold page refcount on private guest pages Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 07/23] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 08/23] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Yan Zhao
2025-11-11  9:52   ` Huang, Kai
2025-11-12  9:29     ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 09/23] KVM: x86/tdp_mmu: Add split_external_spt hook called during write mmu_lock Yan Zhao
2025-11-11 10:06   ` Huang, Kai
2025-11-13  3:16     ` Yan Zhao
2025-11-17  8:53   ` Binbin Wu
2025-11-17  9:09     ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock Yan Zhao
2025-11-11 10:20   ` Huang, Kai
2025-11-13  5:53     ` Yan Zhao
2025-11-17  9:17       ` Binbin Wu
2025-11-17  9:26         ` Yan Zhao
2025-12-09 23:49   ` Sagi Shahar
2025-12-09 23:54     ` Edgecombe, Rick P
2025-12-10  0:28       ` Sagi Shahar
2025-12-10  0:50         ` Yan Zhao
2025-12-10 17:16           ` Sagi Shahar
2025-12-10 19:49             ` Edgecombe, Rick P
2025-12-11  2:10               ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 11/23] KVM: x86: Reject splitting huge pages under shared mmu_lock for mirror root Yan Zhao
2025-09-03  3:30   ` Binbin Wu
2025-08-07  9:43 ` [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs() Yan Zhao
2025-09-03  6:57   ` Binbin Wu
2025-09-03  9:44     ` Yan Zhao
2025-11-11 10:42   ` Huang, Kai
2025-11-13  8:54     ` Yan Zhao
2025-11-13 11:02       ` Huang, Kai
2025-11-13 11:40         ` Huang, Kai
2025-11-14  6:09         ` Yan Zhao
2025-11-18  0:14           ` Huang, Kai
2025-11-18  6:30             ` Yan Zhao
2025-11-18  8:59               ` Binbin Wu
2025-11-18 10:49               ` Huang, Kai
2025-11-19  3:41                 ` Yan Zhao
2026-01-06 10:35                   ` Yan Zhao
2025-11-19  6:23                 ` Yan Zhao
2025-11-19  6:31   ` Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 13/23] KVM: x86: Introduce hugepage_set_guest_inhibit() Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info Yan Zhao
2025-09-03  7:36   ` Binbin Wu
2025-09-03  9:37     ` Yan Zhao
2025-11-11 10:55   ` Huang, Kai
2025-11-14  1:42     ` Yan Zhao
2025-11-18  0:26       ` Huang, Kai
2025-11-18  2:44         ` Yan Zhao
2025-11-11 11:05   ` Huang, Kai
2025-11-14  7:22     ` Yan Zhao
2025-11-18  1:04       ` Huang, Kai
2025-11-18  2:20         ` Yan Zhao
2025-11-18  9:44           ` Huang, Kai
2025-11-19  2:58             ` Yan Zhao
2025-11-19  5:51   ` Binbin Wu
2025-11-19  6:29     ` Yan Zhao
2025-11-19  6:39       ` Binbin Wu
2025-08-07  9:44 ` [RFC PATCH v2 15/23] KVM: Change the return type of gfn_handler_t() from bool to int Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 16/23] KVM: x86: Split cross-boundary mirror leafs for KVM_SET_MEMORY_ATTRIBUTES Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion Yan Zhao
2025-09-04  7:58   ` Binbin Wu
2025-09-04  9:48     ` Yan Zhao
2025-09-04 11:07       ` Yan Zhao
2025-10-01  6:21   ` Ackerley Tng [this message]
2025-10-13  0:18     ` Yan Zhao
2025-10-01  8:00   ` Ackerley Tng
2025-10-13  0:45     ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 18/23] x86/virt/tdx: Do not perform cache flushes unless CLFLUSH_BEFORE_ALLOC is set Yan Zhao
2025-08-11 21:10   ` Sagi Shahar
2025-08-12  6:37     ` Yan Zhao
2025-09-04  8:16   ` Binbin Wu
2025-09-04  9:50     ` Yan Zhao
2025-09-05  9:05       ` Binbin Wu
2025-09-05 15:41   ` Edgecombe, Rick P
2025-09-15  6:05     ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 19/23] KVM: TDX: Pass down pfn to split_external_spt() Yan Zhao
2025-09-04  8:30   ` Binbin Wu
2025-08-07  9:45 ` [RFC PATCH v2 20/23] KVM: TDX: Handle Dynamic PAMT in tdh_mem_page_demote() Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path Yan Zhao
2025-09-04  9:17   ` Binbin Wu
2025-09-04  9:58     ` Yan Zhao
2025-12-05  6:14   ` Sagi Shahar
2025-12-08  5:49     ` Yan Zhao
2025-12-11  1:42       ` Vishal Annapurve
2025-12-11  2:36         ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split Yan Zhao
2025-08-14  5:31   ` Vishal Annapurve
2025-08-14 18:29     ` Vishal Annapurve
2025-08-18  4:19     ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 23/23] KVM: TDX: Turn on PG_LEVEL_2M after TD is RUNNABLE Yan Zhao
2025-11-11 11:25   ` Huang, Kai
2025-11-14  8:34     ` Yan Zhao
2025-11-18  0:56       ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqzh5wj2lc4.fsf@google.com \
    --to=ackerleytng@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=fan.du@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jun.miao@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox