From: Ackerley Tng <ackerleytng@google.com>
To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com,
brauner@kernel.org, chao.p.peng@linux.intel.com,
david@kernel.org, ira.weiny@intel.com, jmattson@google.com,
jroedel@suse.de, jthoughton@google.com, michael.roth@amd.com,
oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com,
rick.p.edgecombe@intel.com, rientjes@google.com,
shivankg@amd.com, steven.price@arm.com, tabba@google.com,
willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
forkloop@google.com, pratyush@kernel.org,
suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>,
Vishal Annapurve <vannapurve@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Vlastimil Babka <vbabka@kernel.org>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
Ackerley Tng <ackerleytng@google.com>
Subject: [PATCH RFC v4 13/44] KVM: guest_memfd: Apply content modes while setting memory attributes
Date: Thu, 26 Mar 2026 15:24:22 -0700 [thread overview]
Message-ID: <20260326-gmem-inplace-conversion-v4-13-e202fe950ffd@google.com> (raw)
In-Reply-To: <20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com>
Provide defined memory content modes so that KVM can make guarantees about
memory content after setting memory attributes, according to userspace
requests.
Suggested-by: Sean Christoperson <seanjc@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
Documentation/virt/kvm/api.rst | 61 ++++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 4 +++
virt/kvm/guest_memfd.c | 56 ++++++++++++++++++++++++++++++++++++--
3 files changed, 119 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 15148c80cfdb6..90587a9c09d3f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6571,6 +6571,8 @@ Errors:
EAGAIN Some page within requested range had unexpected refcounts. The
offset of the page will be returned in `error_offset`.
ENOMEM Ran out of memory trying to track private/shared state
+ EOPNOTSUPP There is no way for KVM to guarantee in-memory contents as
+ requested.
========== ===============================================================
KVM_SET_MEMORY_ATTRIBUTES2 is an extension to
@@ -6619,6 +6621,65 @@ on the shared pages, such as refcounts taken by get_user_pages(), and
try the ioctl again. A possible source of these long term refcounts is
if the guest_memfd memory was pinned in IOMMU page tables.
+By default, KVM makes no guarantees about the in-memory values after
+memory is convert to/from shared/private. Optionally, userspace may
+instruct KVM to ensure the contents of memory are zeroed or preserved,
+e.g. to enable in-place sharing of data, or as an optimization to
+avoid having to re-zero memory when userspace could have relied on the
+trusted entity to guarantee the memory will be zeroed as part of the
+entire conversion process.
+
+The content modes available are as follows:
+
+``KVM_SET_MEMORY_ATTRIBUTES2_ZERO``
+
+ On conversion, KVM guarantees all entities that have "allowed"
+ access to the memory will read zeros. E.g. on private to shared
+ conversion, both trusted and untrusted code will read zeros.
+
+ Zeroing is currently only supported for private-to-shared
+ conversions, as KVM in general is untrusted and thus cannot
+ guarantee the guest (or any trusted entity) will read zeros after
+ conversion. Note, some CoCo implementations do zero memory contents
+ such that the guest reads zeros after conversion, and the guest may
+ choose to rely on that behavior. However, that's a contract between
+ the trusted CoCo entity and the guest, not between KVM and the
+ guest.
+
+``KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE``
+
+ On conversion, KVM guarantees memory contents will be preserved with
+ respect to the last written unencrypted value. As a concrete
+ example, if the host writes ``0xbeef`` to shared memory and converts
+ the memory to private, the guest will also read ``0xbeef``, even if
+ the in-memory data is encrypted as part of the conversion. And vice
+ versa, if the guest writes ``0xbeef`` to private memory and then
+ converts the memory to shared, the host (and guest) will read
+ ``0xbeef`` (if the memory is accessible).
+
+Note: These content modes apply to the entire requested range, not
+just the parts of the range that underwent conversion. For example, if
+this was the initial state:
+
+ * [0x0000, 0x1000): shared
+ * [0x1000, 0x2000): private
+ * [0x2000, 0x3000): shared
+
+and range [0x0000, 0x3000) was set to shared, the content mode would
+apply to all memory in [0x0000, 0x3000), not just the range that
+underwent conversion [0x1000, 0x2000).
+
+Note: These content modes apply only to allocated memory. No
+guarantees are made on offset ranges that do not have memory allocated
+(yet). For example, if this was the initial state:
+
+ * [0x0000, 0x1000): shared
+ * [0x1000, 0x2000): not allocated
+ * [0x2000, 0x3000): shared
+
+and range [0x0000, 0x3000) was set to shared, the content mode would
+apply to only to offset ranges [0x0000, 0x1000) and [0x2000, 0x3000).
+
See also: :ref: `KVM_SET_MEMORY_ATTRIBUTES`.
.. _kvm_run:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 29baaa60de35a..0fc9ad4ea0d93 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1642,6 +1642,10 @@ struct kvm_memory_attributes {
/* Available with KVM_CAP_MEMORY_ATTRIBUTES2 */
#define KVM_SET_MEMORY_ATTRIBUTES2 _IOWR(KVMIO, 0xd2, struct kvm_memory_attributes2)
+#define KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED 0
+#define KVM_SET_MEMORY_ATTRIBUTES2_ZERO (1ULL << 0)
+#define KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE (1ULL << 1)
+
struct kvm_memory_attributes2 {
union {
__u64 address;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index e270e54e030f0..eeac7678fcf4e 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -677,6 +677,19 @@ u64 __weak kvm_arch_gmem_supported_content_modes(struct kvm *kvm)
return 0;
}
+static bool kvm_gmem_content_mode_is_supported(struct kvm *kvm,
+ u64 content_mode,
+ bool to_private)
+{
+ if (content_mode == KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED)
+ return true;
+
+ if (content_mode == KVM_SET_MEMORY_ATTRIBUTES2_ZERO && to_private)
+ return false;
+
+ return kvm_arch_gmem_supported_content_modes(kvm) & content_mode;
+}
+
int kvm_gmem_apply_content_mode_zero(struct inode *inode, pgoff_t start,
pgoff_t end)
{
@@ -736,8 +749,26 @@ int __weak kvm_arch_gmem_apply_content_mode_preserve(struct kvm *kvm,
return -EOPNOTSUPP;
}
+static int kvm_gmem_apply_content_mode(struct kvm *kvm, uint64_t content_mode,
+ struct inode *inode, pgoff_t start,
+ pgoff_t end)
+{
+ switch (content_mode) {
+ case KVM_SET_MEMORY_ATTRIBUTES2_MODE_UNSPECIFIED:
+ return kvm_arch_gmem_apply_content_mode_unspecified(kvm, inode, start, end);
+ case KVM_SET_MEMORY_ATTRIBUTES2_ZERO:
+ return kvm_arch_gmem_apply_content_mode_zero(kvm, inode, start, end);
+ case KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE:
+ return kvm_arch_gmem_apply_content_mode_preserve(kvm, inode, start, end);
+ default:
+ WARN_ONCE(1, "Unexpected policy requested.");
+ return -EOPNOTSUPP;
+ }
+}
+
static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
size_t nr_pages, uint64_t attrs,
+ struct kvm *kvm, uint64_t content_mode,
pgoff_t *err_index)
{
bool to_private = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
@@ -752,9 +783,23 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
filemap_invalidate_lock(mapping);
+ if (!kvm_gmem_content_mode_is_supported(kvm, content_mode,
+ to_private)) {
+ r = -EOPNOTSUPP;
+ *err_index = start;
+ goto out;
+ }
+
mas_init(&mas, mt, start);
if (kvm_gmem_range_has_attributes(mt, start, nr_pages, attrs)) {
+ /*
+ * Even if no update is required to attributes, the
+ * requested content mode is applied.
+ */
+ WARN_ON(kvm_gmem_apply_content_mode(kvm, content_mode,
+ inode, start, end));
+
r = 0;
goto out;
}
@@ -786,6 +831,9 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
if (!to_private)
kvm_gmem_invalidate(inode, start, end);
+ WARN_ON(kvm_gmem_apply_content_mode(kvm, content_mode, inode,
+ start, end));
+
mas_store_prealloc(&mas, xa_mk_value(attrs));
kvm_gmem_invalidate_end(inode, start, end);
@@ -807,7 +855,11 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
if (copy_from_user(&attrs, argp, sizeof(attrs)))
return -EFAULT;
- if (attrs.flags)
+ if (attrs.flags & ~(KVM_SET_MEMORY_ATTRIBUTES2_ZERO |
+ KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE))
+ return -EINVAL;
+ if ((attrs.flags & KVM_SET_MEMORY_ATTRIBUTES2_ZERO) &&
+ (attrs.flags & KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE))
return -EINVAL;
if (attrs.error_offset)
return -EINVAL;
@@ -829,7 +881,7 @@ static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
nr_pages = attrs.size >> PAGE_SHIFT;
index = attrs.offset >> PAGE_SHIFT;
r = __kvm_gmem_set_attributes(inode, index, nr_pages, attrs.attributes,
- &err_index);
+ f->kvm, attrs.flags, &err_index);
if (r) {
attrs.error_offset = ((uint64_t)err_index) << PAGE_SHIFT;
--
2.53.0.1018.g2bb0e51243-goog
next prev parent reply other threads:[~2026-03-26 22:25 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 22:24 [PATCH RFC v4 00/44] guest_memfd: In-place conversion support Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 01/44] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 02/44] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 03/44] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 04/44] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 05/44] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 06/44] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 07/44] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 08/44] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 09/44] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 11/44] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 12/44] KVM: guest_memfd: Introduce default handlers for content modes Ackerley Tng
2026-03-26 22:24 ` Ackerley Tng [this message]
2026-03-26 22:24 ` [PATCH RFC v4 14/44] KVM: x86: Add support for applying " Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 15/44] KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 16/44] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 17/44] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 18/44] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 19/44] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 20/44] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 21/44] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 22/44] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 23/44] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 24/44] KVM: selftests: Test using guest_memfd for guest private memory Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 25/44] KVM: selftests: Test basic single-page conversion flow Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 26/44] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 27/44] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 28/44] KVM: selftests: Test conversion before allocation Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 29/44] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 30/44] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 31/44] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 32/44] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 33/44] KVM: selftests: Test that conversion to private does not support ZERO Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 34/44] KVM: selftests: Support checking that data not equal expected Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 35/44] KVM: selftests: Test that not specifying a conversion flag scrambles memory contents Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 36/44] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 37/44] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 38/44] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 39/44] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 40/44] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 41/44] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 42/44] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 43/44] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 44/44] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 0/6] guest_memfd in-place conversion selftests for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 1/6] KVM: selftests: Initialize guest_memfd with INIT_SHARED Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 2/6] KVM: selftests: Call snp_launch_update_data() providing copy of memory Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 3/6] KVM: selftests: Make guest_code_xsave more friendly Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 4/6] KVM: selftests: Allow specifying CoCo-privateness while mapping a page Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 5/6] KVM: selftests: Test conversions for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 6/6] KVM: selftests: Test content modes ZERO and PRESERVE " Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260326-gmem-inplace-conversion-v4-13-e202fe950ffd@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=forkloop@google.com \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=jgg@ziepe.ca \
--cc=jmattson@google.com \
--cc=jroedel@suse.de \
--cc=jthoughton@google.com \
--cc=kasong@tencent.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=nphamcs@gmail.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shikemeng@huaweicloud.com \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox