Re: [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>, pbonzini@redhat.com, seanjc@google.com
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	x86@kernel.org,  rick.p.edgecombe@intel.com,
	dave.hansen@intel.com, kas@kernel.org,  tabba@google.com,
	quic_eberman@quicinc.com, michael.roth@amd.com,
	 david@redhat.com, vannapurve@google.com, vbabka@suse.cz,
	 thomas.lendacky@amd.com, pgonda@google.com,
	zhiquan1.li@intel.com,  fan.du@intel.com, jun.miao@intel.com,
	ira.weiny@intel.com,  isaku.yamahata@intel.com,
	xiaoyao.li@intel.com, binbin.wu@linux.intel.com,
	 chao.p.peng@intel.com, yan.y.zhao@intel.com
Subject: Re: [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion
Date: Wed, 01 Oct 2025 08:00:21 +0000	[thread overview]
Message-ID: <diqzecrn2gru.fsf@google.com> (raw)
In-Reply-To: <20250807094503.4691-1-yan.y.zhao@intel.com>

Yan Zhao <yan.y.zhao@intel.com> writes:

I was looking deeper into this patch since on my WIP tree I already had
the invalidate and zap steps separated out and had to do more to rebase
this patch :)

> In TDX, private page tables require precise zapping because faulting back
> the zapped mappings necessitates the guest's re-acceptance.

I feel that this statement could be better phrased because all zapped
mappings require re-acceptance, not just anything related to precise
zapping. Would this be better:

    On private-to-shared conversions, page table entries must be zapped
    from the Secure EPTs. Any pages mapped into Secure EPTs must be
    accepted by the guest before they are used.

    Hence, care must be taken to only precisely zap ranges requested for
    private-to-shared conversion, since the guest is only prepared to
    re-accept precisely the ranges it requested for conversion.

    The guest may request to convert ranges not aligned with private
    page table entry boundaries. To precisely zap these ranges, huge
    leaves that span the boundaries of the requested ranges must be
    split into smaller leaves, so that the split, smaller leaves now
    align with the requested range for zapping.

> Therefore,
> before performing a zap for hole punching and private-to-shared
> conversions, huge leafs that cross the boundary of the zapping GFN range in
> the mirror page table must be split.
>
> Splitting may result in an error. If this happens, hole punching and
> private-to-shared conversion should bail out early and return an error to
> userspace.
>
> Splitting is not necessary for kvm_gmem_release() since the entire page
> table is being zapped, nor for kvm_gmem_error_folio() as an SPTE must not
> map more than one physical folio.
>

I think splitting is not necessary as long as aligned page table entries
are zapped. Splitting is also not necessary if the entire page table is
zapped but that's a superset of zapping aligned page table
entries. (Probably just a typo on your side.) Here's my attempt at
rephrasing this:

    Splitting is not necessary for the cases where only aligned page
    table entries are zapped, such as during kvm_gmem_release() where
    the entire guest_memfd worth of memory is zapped, nor for
    truncation, where truncation of pages within a huge folio is not
    allowed.

> Therefore, in this patch,
> - break kvm_gmem_invalidate_begin_and_zap() into
>   kvm_gmem_invalidate_begin() and kvm_gmem_zap() and have
>   kvm_gmem_release() and kvm_gmem_error_folio() to invoke them.
>
> - have kvm_gmem_punch_hole() to invoke kvm_gmem_invalidate_begin(),
>   kvm_gmem_split_private(), and kvm_gmem_zap().
>   Bail out if kvm_gmem_split_private() returns error.
>
> - drop the old kvm_gmem_unmap_private() and have private-to-shared
>   conversion to invoke kvm_gmem_split_private() and kvm_gmem_zap() instead.
>   Bail out if kvm_gmem_split_private() returns error.
>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
> RFC v2:
> - Rebased to [1]. As changes in this patch are gmem specific, they may need
>   to be updated if the implementation in [1] changes.
> - Update kvm_split_boundary_leafs() to kvm_split_cross_boundary_leafs() and
>   invoke it before kvm_gmem_punch_hole() and private-to-shared conversion.
>
> [1] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com/
>
> RFC v1:
> - new patch.
> 
> [...snip...]
>

next prev parent reply	other threads:[~2025-10-01  8:00 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  9:39 [RFC PATCH v2 00/23] KVM: TDX huge page support for private memory Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 01/23] x86/tdx: Enhance tdh_mem_page_aug() to support huge pages Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 02/23] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_page_demote() Yan Zhao
2025-09-01  8:55   ` Binbin Wu
2025-09-01  9:08     ` Yan Zhao
2025-09-02 16:56       ` Edgecombe, Rick P
2025-09-02 17:37         ` Sean Christopherson
2025-09-02 17:45           ` Edgecombe, Rick P
2025-09-04  9:31             ` Yan Zhao
2025-11-11  9:15       ` Huang, Kai
2025-11-12  8:06         ` Yan Zhao
2025-11-14  9:14           ` Binbin Wu
2025-11-14  9:21             ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 03/23] x86/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages Yan Zhao
2025-11-11  9:23   ` Huang, Kai
2025-11-12  8:43     ` Yan Zhao
2025-11-12 10:29       ` Huang, Kai
2025-11-13  2:35         ` Yan Zhao
2025-11-13  7:37           ` Huang, Kai
2025-11-13  9:03             ` Yan Zhao
2025-11-13 15:26             ` Dave Hansen
2025-11-14  1:21               ` Yan Zhao
2025-12-10  1:14   ` Vishal Annapurve
2025-12-10  1:18     ` Yan Zhao
2025-12-10  1:30       ` Vishal Annapurve
2025-12-10  1:55         ` Yan Zhao
2025-12-31 19:37           ` Vishal Annapurve
2026-01-06 10:37             ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 04/23] KVM: TDX: Introduce tdx_clear_folio() to clear " Yan Zhao
2025-09-02  2:56   ` Binbin Wu
2025-09-03  9:51     ` Yan Zhao
2025-09-03 11:19       ` Binbin Wu
2025-09-04  2:53         ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 05/23] x86/tdx: Enhance tdh_phymem_page_reclaim() to support " Yan Zhao
2025-11-17  2:09   ` Binbin Wu
2025-11-17  4:05     ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 06/23] KVM: TDX: Do not hold page refcount on private guest pages Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 07/23] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 08/23] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Yan Zhao
2025-11-11  9:52   ` Huang, Kai
2025-11-12  9:29     ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 09/23] KVM: x86/tdp_mmu: Add split_external_spt hook called during write mmu_lock Yan Zhao
2025-11-11 10:06   ` Huang, Kai
2025-11-13  3:16     ` Yan Zhao
2025-11-17  8:53   ` Binbin Wu
2025-11-17  9:09     ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock Yan Zhao
2025-11-11 10:20   ` Huang, Kai
2025-11-13  5:53     ` Yan Zhao
2025-11-17  9:17       ` Binbin Wu
2025-11-17  9:26         ` Yan Zhao
2025-12-09 23:49   ` Sagi Shahar
2025-12-09 23:54     ` Edgecombe, Rick P
2025-12-10  0:28       ` Sagi Shahar
2025-12-10  0:50         ` Yan Zhao
2025-12-10 17:16           ` Sagi Shahar
2025-12-10 19:49             ` Edgecombe, Rick P
2025-12-11  2:10               ` Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 11/23] KVM: x86: Reject splitting huge pages under shared mmu_lock for mirror root Yan Zhao
2025-09-03  3:30   ` Binbin Wu
2025-08-07  9:43 ` [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs() Yan Zhao
2025-09-03  6:57   ` Binbin Wu
2025-09-03  9:44     ` Yan Zhao
2025-11-11 10:42   ` Huang, Kai
2025-11-13  8:54     ` Yan Zhao
2025-11-13 11:02       ` Huang, Kai
2025-11-13 11:40         ` Huang, Kai
2025-11-14  6:09         ` Yan Zhao
2025-11-18  0:14           ` Huang, Kai
2025-11-18  6:30             ` Yan Zhao
2025-11-18  8:59               ` Binbin Wu
2025-11-18 10:49               ` Huang, Kai
2025-11-19  3:41                 ` Yan Zhao
2026-01-06 10:35                   ` Yan Zhao
2025-11-19  6:23                 ` Yan Zhao
2025-11-19  6:31   ` Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 13/23] KVM: x86: Introduce hugepage_set_guest_inhibit() Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info Yan Zhao
2025-09-03  7:36   ` Binbin Wu
2025-09-03  9:37     ` Yan Zhao
2025-11-11 10:55   ` Huang, Kai
2025-11-14  1:42     ` Yan Zhao
2025-11-18  0:26       ` Huang, Kai
2025-11-18  2:44         ` Yan Zhao
2025-11-11 11:05   ` Huang, Kai
2025-11-14  7:22     ` Yan Zhao
2025-11-18  1:04       ` Huang, Kai
2025-11-18  2:20         ` Yan Zhao
2025-11-18  9:44           ` Huang, Kai
2025-11-19  2:58             ` Yan Zhao
2025-11-19  5:51   ` Binbin Wu
2025-11-19  6:29     ` Yan Zhao
2025-11-19  6:39       ` Binbin Wu
2025-08-07  9:44 ` [RFC PATCH v2 15/23] KVM: Change the return type of gfn_handler_t() from bool to int Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 16/23] KVM: x86: Split cross-boundary mirror leafs for KVM_SET_MEMORY_ATTRIBUTES Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion Yan Zhao
2025-09-04  7:58   ` Binbin Wu
2025-09-04  9:48     ` Yan Zhao
2025-09-04 11:07       ` Yan Zhao
2025-10-01  6:21   ` Ackerley Tng
2025-10-13  0:18     ` Yan Zhao
2025-10-01  8:00   ` Ackerley Tng [this message]
2025-10-13  0:45     ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 18/23] x86/virt/tdx: Do not perform cache flushes unless CLFLUSH_BEFORE_ALLOC is set Yan Zhao
2025-08-11 21:10   ` Sagi Shahar
2025-08-12  6:37     ` Yan Zhao
2025-09-04  8:16   ` Binbin Wu
2025-09-04  9:50     ` Yan Zhao
2025-09-05  9:05       ` Binbin Wu
2025-09-05 15:41   ` Edgecombe, Rick P
2025-09-15  6:05     ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 19/23] KVM: TDX: Pass down pfn to split_external_spt() Yan Zhao
2025-09-04  8:30   ` Binbin Wu
2025-08-07  9:45 ` [RFC PATCH v2 20/23] KVM: TDX: Handle Dynamic PAMT in tdh_mem_page_demote() Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path Yan Zhao
2025-09-04  9:17   ` Binbin Wu
2025-09-04  9:58     ` Yan Zhao
2025-12-05  6:14   ` Sagi Shahar
2025-12-08  5:49     ` Yan Zhao
2025-12-11  1:42       ` Vishal Annapurve
2025-12-11  2:36         ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split Yan Zhao
2025-08-14  5:31   ` Vishal Annapurve
2025-08-14 18:29     ` Vishal Annapurve
2025-08-18  4:19     ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 23/23] KVM: TDX: Turn on PG_LEVEL_2M after TD is RUNNABLE Yan Zhao
2025-11-11 11:25   ` Huang, Kai
2025-11-14  8:34     ` Yan Zhao
2025-11-18  0:56       ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqzecrn2gru.fsf@google.com \
    --to=ackerleytng@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=fan.du@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jun.miao@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox