From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D590B35836E for ; Tue, 23 Jun 2026 01:15:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782177349; cv=none; b=L40x8UwFkxnQ0ojy5Q+shsHSxh6mDCPVinUCkbZLdsfgev7zzRMbc794HrutPaCk3R/PuFs0FxdTmhVpgu0IK11Njyb6Q6obkU1P+XtquEkTtXIxO/Dfz2+5me5cVV5euohq8KzA+RwonpLgr19V7d8ljLOv6Wn4gijTml47mz4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782177349; c=relaxed/simple; bh=CJC2eEAu9a9FSgizYGmOPBhbJTloZgwrRUlwAxJ41/g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ofqWr28OOjZjB5OCPajguE+BeonHShOIdP2dh15WLPNKyjD4RXWrc7NZDbx4pcfQ6z4neaKm4FIeHEtolw1ZbYR+LjovNZmjZ00b0J7PqGuSvGGGRjuoH/65mCEONI95Zn0vBc/Z7Qog/6kIX/vA525qhZqd+pq/5ZEpZm9nSCg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cdno2SBN; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cdno2SBN" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-8421f5d76aaso3454471b3a.2 for ; Mon, 22 Jun 2026 18:15:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1782177347; x=1782782147; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cIaeqJX/r6ONezV3RTMNlvuGHs6v4DCaJbK0FNFaqOE=; b=cdno2SBNB9DpXRN00nZKja+FMHqRnIpYgfaHVnysBzv/RtkarjGmq/2gryLAW/W3oQ qY3hjgKZzhZcZrEFJlOb84R5zz7Lr8kIVYp9m5k4kyxxxJ1s6JyJTUUOjF1KESGr1tYD laUgFKB7WoN+cu9XGQEigVb+o27F1a/twJ2MwraRrs6szoGSlDgk2Pz6aOZ1f3mCf5Dv AwWwXsXlSAshXdU9Fnlhuk03ut/NgiUGRlEPzXyL/HH2SPUvYMEP0xrSAqulk4woQbLq pofg6HTXLZzRDrITf2yxSrQglLdwNraR8uUtdWJWqLJ1pvsgv3/edp0o2mG1LQ2ISpcE wY9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782177347; x=1782782147; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cIaeqJX/r6ONezV3RTMNlvuGHs6v4DCaJbK0FNFaqOE=; b=lVm1tsGUNOM1yazLzvmMuhY3Uyf/l+JBltPexmSOlu2Z6glC7RfuwD/LL4gYS9aICR CkpPdpChyFo5W02eDjqIG5rd3tfqb46WTdHCOumPCXNs1edkQ0nEP1RzjgjEiS31MYWQ z87fiRokvNuDhwMLd/uxh3wqWxAonD0j10dzxaDdDUD35pjp5DCUr6mvad9ZkwIkQGs9 awx8ndTnFpr0FkbDUmG2IIfrSUlVDaTFPXtEVoVFeAQstjp8JRaQPUBLXhfOe4s/2/wl 6AFUHTvVk0iAuxANz5BJDcB5RL/P/4VNdB81w75HvFo7LzrR1f9BRSGMGu8azkWrljWA MxQA== X-Forwarded-Encrypted: i=1; AFNElJ/apsjwLyeNrGcuM7Y/03ZKgVzBNti1Kr8k1icwrmQ09YECnmt4vmWGOrOnpHIafvs6Jk45LD+3IWcpI58=@vger.kernel.org X-Gm-Message-State: AOJu0YyH4VckIACY335LNAmJ4dezy/QHgGXFaMO6i1Qj/z28WWO7AnNO tTX0C9cRiECrw+QYh7LeZOW7hbFPrMci9tySJjOJyjfAFP/e2ES922kaH61wfGMSwr7qz/svHUt 3QHrC8Q== X-Received: from pfbim3.prod.google.com ([2002:a05:6a00:8d83:b0:845:2f64:1f67]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:22c5:b0:82f:38df:681c with SMTP id d2e1a72fcca58-8459704a890mr377593b3a.6.1782177346471; Mon, 22 Jun 2026 18:15:46 -0700 (PDT) Date: Mon, 22 Jun 2026 18:15:45 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260618-gmem-inplace-conversion-v8-0-9d2959357853@google.com> <20260618-gmem-inplace-conversion-v8-15-9d2959357853@google.com> Message-ID: Subject: Re: [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion From: Sean Christopherson To: Fuad Tabba Cc: ackerleytng@google.com, aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com, brauner@kernel.org, chao.p.peng@linux.intel.com, david@kernel.org, jmattson@google.com, jthoughton@google.com, michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com, rick.p.edgecombe@intel.com, rientjes@google.com, shivankg@amd.com, steven.price@arm.com, willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com, forkloop@google.com, pratyush@kernel.org, suzuki.poulose@arm.com, aneesh.kumar@kernel.org, liam@infradead.org, Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , Shuah Khan , Vishal Annapurve , Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Youngjun Park , Qi Zheng , Shakeel Butt , Kiryl Shutsemau , Baoquan He , Jason Gunthorpe , Vlastimil Babka , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev Content-Type: text/plain; charset="us-ascii" On Fri, Jun 19, 2026, Fuad Tabba wrote: > On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay > wrote: > > > > From: Ackerley Tng > > > > When memory in guest_memfd is converted from private to shared, the > > platform-specific state associated with the guest-private pages must be > > invalidated or cleaned up. > > > > Iterate over the folios in the affected range and call the > > kvm_arch_gmem_invalidate() hook for each PFN range. This allows > > architectures to perform necessary teardown, such as updating hardware > > metadata or encryption states, before the pages are transitioned to the > > shared state. > > > > Invoke this helper after indicating to KVM's mmu code that an invalidation > > is in progress to stop in-flight page faults from succeeding. > > > > Reviewed-by: Fuad Tabba > > Signed-off-by: Ackerley Tng > > Coming back to this after working through the arm64/pKVM side. My > Reviewed-by here is from the previous round and the patch hasn't > changed, but I missed an implication for arm64. > > kvm_arch_gmem_invalidate() is now called from two paths with the same > (start, end) signature: folio teardown (kvm_gmem_free_folio) and > private->shared conversion (here). For SNP/TDX that's fine, conversion is > destructive anyway. For pKVM the two need opposite content semantics: > conversion must preserve the page in place (same physical page, the point > of in-place conversion without encryption), while teardown must scrub it > before returning it to the host. > > The hook gets only a pfn range with no indication of which caller it's > serving, so arm64 can't give the two paths the behaviour they need. It > would help to signal intent on the conversion path: a reason/flag, a > separate hook, or not routing non-destructive conversion through the > teardown hook. > > arm64 isn't here yet, so this isn't urgent, but the hook is gaining a > second caller now, and it's cheaper to leave room for the distinction > than to change a generic contract other arches depend on later. Crud. It may not be urgent for arm64, but it's urgent for other reasons that I "can't" describe in detail at the moment, and even if that weren't the case, I think we should clean things up now. More below. > > virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 41 insertions(+) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index 433f79047b9d1..3c94442bc8131 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start, > > return safe; > > } > > > > +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE > > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) Not your fault, but kvm_arch_gmem_invalidate() is badly misnamed. It's not "invalidating" anything, it's much more of a "free" callback, as SNP uses it to put physical pages back into a shared state when a maybe-private folio is freed. As Fuad points out, (ab)using that hook for the private=>shared conversion case "works", but not broadly. And it makes the bad name worse, because it's called from code that _is_ doing true invalidations. For pKVM, it may not even need to do anything invalidation-like. To avoid a conflict with patches that are going to have priority over this series, to set the stage for arm64 support, and to avoid avoid bleeding vendor details into guest_memfd, as if they are core guest_memfd behavior (only SNP needs the "invalidation" on this specific transition), I think we should add an arch hook to do conversions straightaway. Unless there's a clever option I'm missing, it'll mean adding yet another HAVE_KVM_ARCH_GMEM_XXX flag? Hmm, especially because IIUC, arm64/pKVM doesn't need a callback for this case, only the free_folio case. > > +{ > > + struct folio_batch fbatch; > > + pgoff_t next = start; > > + int i; > > + > > + folio_batch_init(&fbatch); > > + while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) { > > + for (i = 0; i < folio_batch_count(&fbatch); ++i) { > > + struct folio *folio = fbatch.folios[i]; > > + pgoff_t start_index, end_index; > > + kvm_pfn_t start_pfn, end_pfn; > > + > > + start_index = max(start, folio->index); > > + end_index = min(end, folio_next_index(folio)); > > + /* > > + * end_index is either in folio or points to > > + * the first page of the next folio. Hence, > > + * all pages in range [start_index, end_index) > > + * are contiguous. > > + */ > > + start_pfn = folio_file_pfn(folio, start_index); > > + end_pfn = start_pfn + end_index - start_index; > > + > > + kvm_arch_gmem_invalidate(start_pfn, end_pfn); > > + } > > + > > + folio_batch_release(&fbatch); > > + cond_resched(); > > + } > > +} > > +#else > > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {} > > +#endif > > + > > static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start, > > size_t nr_pages, uint64_t attrs, > > pgoff_t *err_index) > > @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start, > > */ > > > > kvm_gmem_invalidate_start(inode, start, end); > > + > > + if (!to_private) > > + kvm_gmem_invalidate(inode, start, end); E.g. instead make this something like this? kvm_gmem_set_pfn_attributes(...) Hrm, though that wastes folio lookups in the to_private case. So maybe just this, assuming pKVM doesn't need to take additional action on conversions? if (!to_private) kvm_gmem_make_shared(...) Actually, if we do that, then we don't need a separate arch hook, just a separate config. It'll still bleed SNP details into guest_memfd, but it'll at least be done in a way that's more explicitly arch specific (and it's no different than what we already do for PREPARE...). E.g. this? There will still be a looming rename conflict, but that's easy enough to handle. diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c index 9ce5be7843f2..8aead0abd788 100644 --- virt/kvm/guest_memfd.c +++ virt/kvm/guest_memfd.c @@ -648,8 +648,8 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start, return safe; } -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE -static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) +#ifdef CONFIG_KVM_ARCH_GMEM_FREE_ON_SHARED_CONVERSION +static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end) { struct folio_batch fbatch; pgoff_t next = start; @@ -681,7 +681,7 @@ static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) } } #else -static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {} +static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end) { } #endif static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start, @@ -729,7 +729,7 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start, kvm_gmem_invalidate_start(inode, start, end); if (!to_private) - kvm_gmem_invalidate(inode, start, end); + kvm_gmem_make_shared(inode, start, end); mas_store_prealloc(&mas, xa_mk_value(attrs));