From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, x86@kernel.org, aik@amd.com,
andrew.jones@linux.dev, binbin.wu@linux.intel.com, bp@alien8.de,
brauner@kernel.org, chao.p.peng@intel.com,
chao.p.peng@linux.intel.com, chenhuacai@kernel.org,
corbet@lwn.net, dave.hansen@linux.intel.com, david@kernel.org,
hpa@zytor.com, ira.weiny@intel.com, jgg@nvidia.com,
jmattson@google.com, jroedel@suse.de, jthoughton@google.com,
maobibo@loongson.cn, mathieu.desnoyers@efficios.com,
maz@kernel.org, mhiramat@kernel.org, michael.roth@amd.com,
mingo@redhat.com, mlevitsk@redhat.com, oupton@kernel.org,
pankaj.gupta@amd.com, pbonzini@redhat.com, prsampat@amd.com,
qperret@google.com, ricarkol@google.com,
rick.p.edgecombe@intel.com, rientjes@google.com,
rostedt@goodmis.org, shivankg@amd.com, shuah@kernel.org,
steven.price@arm.com, tglx@linutronix.de, vannapurve@google.com,
vbabka@suse.cz, willy@infradead.org, wyihan@google.com,
yan.y.zhao@intel.com
Subject: Re: [RFC PATCH v2 09/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
Date: Thu, 12 Mar 2026 08:44:24 -0700 [thread overview]
Message-ID: <abLfWHf89TxWqeGZ@google.com> (raw)
In-Reply-To: <CA+EHjTy2urW2Tj5czQDKUHdri7FCLfw2mafTgmmtFs+-7ueoiw@mail.gmail.com>
On Thu, Mar 12, 2026, Fuad Tabba wrote:
> Hi Ackerley,
>
> Before getting into the UAPI semantics, thank you for all the heavy
> lifting you've done here. Figuring out how to make it all work across
> the different platforms is not easy :)
>
> <snip>
>
> > The policy definitions below provide more details:
Please drop "CONTENT_POLICY" from the KVM documentation. From KVM's perspective,
these are not "policy", they are purely properties of the underlying memory.
Userspace will likely use the attributes to implement policy of some kind, but
KVM straight up doesn't care.
> > ``KVM_SET_MEMORY_ATTRIBUTES2_CONTENT_POLICY_ZERO`` (default)
The default behavior absolutely cannot be something that's not supported on
every conversion type.
> >
> > On a private to shared conversion, the host will read zeros from the
> > converted memory on the next fault after successful return of the
> > KVM_SET_MEMORY_ATTRIBUTES2 ioctl.
> >
> > This is not supported (-EOPNOTSUPP) for a shared to private
> > conversion. While some CoCo implementations do zero memory contents
> > such that the guest reads zeros after conversion, the guest is not
> > expected to trust host-provided zeroing, hence as a UAPI policy, KVM
> > does not make any such guarantees.
>
> The rationale for not supporting this in the UAPI isn't quite right
> and I think that the prohibition should be removed. It's true that the
> guest is not expected to trust host-provided zeroing. However, if the
> VMM invokes this ioctl with the ZERO policy, the zeroing is performed
> by the hypervisor, not by the (untrusted) host.
What entity zeros the data doesn't matter as far as KVM's ABI is concerned. That's
a motivating favor to providing ZERO, e.g. it allow userspace to elide additional
zeroing when it _knows_ the memory holds zeros, but that's orthogonal to KVM's
contract with userspace.
> Although pKVM handles fresh, zeroed memory provisioning via donation
> rather than attribute conversion, stating that the UAPI cannot make
> guarantees due to trust boundaries is incorrect. The hypervisor is
We should avoid using "hypervisor", because (a) it means different things to
different people and (b) even when there's consensus on what "hypervisor" means,
whether or not the hypervisor is trusted varies per implementation.
> need to be careful witho precisely the entity the guest trusts to enforce
> this.
>
> The UAPI should define the semantics for a shared-to-private ZERO
> conversion, even if current architectures return -EOPNOTSUPP because
> they handle fresh memory provisioning via other mechanisms (like
> pKVM's donation path).
>
> How about something like the following:
>
> On a shared to private conversion, the hypervisor will zero the memory
Again, say _nothing_ about "the hypervisor". _How_ or when anything happens is
completely irrelevant, the only thing that matters here is _what_ happens.
> contents before mapping it into the guest's private address space,
> preventing the untrusted host from injecting arbitrary data into the
> guest. If an architecture handles zeroed-provisioning via mechanisms
> other than attribute conversion, it may return -EOPNOTSUPP.
No. I am 100% against bleeding vendor specific information into KVM's ABI for
this. What the vendor code does is irrelevant, the _only_ thing that matters
here is KVM's contract with userspace.
That doesn't mean pKVM guests can't rely on memory being zeroed, but that is a
contract between pKVM and its guests, not between KVM and host userspace.
> > For testing purposes, the KVM_X86_SW_PROTECTED_VM testing vehicle
> > will support this policy and ensure zeroing for conversions in both
> > directions.
> >
> > ``KVM_SET_MEMORY_ATTRIBUTES2_CONTENT_POLICY_PRESERVE``
> >
> > On private/shared conversions in both directions, memory contents
> > will be preserved and readable. As a concrete example, if the host
> > writes ``0xbeef`` to memory and converts the memory to shared, the
> > guest will also read ``0xbeef``, after any necessary hardware or
> > software provided decryption. After a reverse shared to private
> > conversion, the host will also read ``0xbeef``.
>
> I think that this example is backwards. If the host writes to memory,
> that memory is already shared, isn't it? Converting it to shared is
> redundant. More importantly, if memory undergoes a shared-to-private
> conversion, the host must lose access entirely.
Ya, it's messed up.
> Maybe a clearer example would reflect actual payload injection and
> bounce buffer sharing:
> - Shared-to-Private (Payload Injection): The host writes a payload
> (e.g., 0xbeef) to shared memory and converts it to private. The guest
> reads 0xbeef in its private address space. The host loses access.
> - Private-to-Shared (Bounce Buffer): The guest writes 0xbeef to
> private memory and converts it to shared. The host reads 0xbeef.
>
> > pKVM (ARM) is the first user of this policy. Since pKVM does not
> > protect memory with encryption, a content policy to preserve memory
> > will not will not involve any decryption. The guest will be able to
> > read what the host wrote with full content preservation.
>
> This is correct, but to be precise, I think it should explicitly
> mention Stage-2 page tables as the protection mechanism, maybe:
pKVM shouldn't be mentioned in here at all.
---
By default, KVM makes no guarantees about the in-memory values after memory is
convert to/from shared/private. Optionally, userspace may instruct KVM to
ensure the contents of memory are zeroed or preserved, e.g. to enable in-place
sharing of data, or as an optimization to avoid having to re-zero memory when
the trusted entity guarantees the memory will be zeroed after conversion.
The behaviors supported by a given KVM instance can be queried via <cap>. If
the requested behavior is an unsupported, KVM will return -EOPNOTSUPP and
reject the conversion request. Note! The "ZERO" request is only support for
private to shared conversion!
``KVM_SET_MEMORY_ATTRIBUTES2_ZERO``
On conversion, KVM guarantees all entities that have "allowed" access to the
memory will read zeros. E.g. on private to shared conversion, both trusted
and untrusted code will read zeros.
Zeroing is currently only supported for private-to-shared conversions, as KVM
in general is untrusted and thus cannot guarantee the guest (or any trusted
entity) will read zeros after conversion. Note, some CoCo implementations do
zero memory contents such that the guest reads zeros after conversion, and
the guest may choose to rely on that behavior. But that's a contract between
the trusted CoCo entity and the guest, not between KVM and the guest.
``KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE``
On conversion, KVM guarantees memory contents will be preserved with respect
to the last written unencrypted value. As a concrete example, if the host
writes ``0xbeef`` to shared memory and converts the memory to private, the
guest will also read ``0xbeef``, even if the in-memory data is encrypted as
part of the conversion. And vice versa, if the guest writes ``0xbeef`` to
private memory and then converts the memory to shared, the host (and guest)
will read ``0xbeef`` (if the memory is accessible).
next prev parent reply other threads:[~2026-03-12 15:44 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 22:36 [RFC PATCH v2 00/37] guest_memfd: In-place conversion support Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 01/37] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 02/37] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 03/37] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 04/37] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 06/37] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 07/37] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 08/37] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 09/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-02-14 20:09 ` Ackerley Tng
2026-02-17 23:04 ` Sean Christopherson
2026-02-19 12:43 ` Fuad Tabba
2026-02-24 10:14 ` Ackerley Tng
2026-02-25 11:00 ` Fuad Tabba
2026-02-26 4:16 ` Ackerley Tng
2026-02-26 8:11 ` Fuad Tabba
2026-03-12 5:44 ` Ackerley Tng
2026-03-12 15:12 ` Fuad Tabba
2026-03-12 15:44 ` Sean Christopherson [this message]
2026-03-12 21:59 ` Ackerley Tng
2026-03-13 0:36 ` Sean Christopherson
2026-03-13 8:32 ` Fuad Tabba
2026-03-13 8:31 ` Fuad Tabba
2026-02-02 22:29 ` [RFC PATCH v2 10/37] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 11/37] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 12/37] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 13/37] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 14/37] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 15/37] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 16/37] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 17/37] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 18/37] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 19/37] KVM: selftests: Test using guest_memfd for guest private memory Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 20/37] KVM: selftests: Test basic single-page conversion flow Ackerley Tng
2026-02-02 22:29 ` [RFC PATCH v2 21/37] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 22/37] KVM: selftests: Test indexing in guest_memfd Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 23/37] KVM: selftests: Test conversion before allocation Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 24/37] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 25/37] KVM: selftests: Test precision of conversion Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 26/37] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 27/37] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 28/37] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 29/37] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 30/37] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 31/37] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 32/37] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 33/37] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2026-02-14 19:49 ` Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 34/37] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 35/37] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 36/37] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2026-02-02 22:30 ` [RFC PATCH v2 37/37] KVM: selftests: Update private memory exits test work with per-gmem attributes Ackerley Tng
2026-02-20 9:09 ` [RFC PATCH v2 00/37] guest_memfd: In-place conversion support Lisa Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abLfWHf89TxWqeGZ@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=andrew.jones@linux.dev \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@intel.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=jmattson@google.com \
--cc=jroedel@suse.de \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=maobibo@loongson.cn \
--cc=mathieu.desnoyers@efficios.com \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=prsampat@amd.com \
--cc=qperret@google.com \
--cc=ricarkol@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox