public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com,
	 brauner@kernel.org, chao.p.peng@linux.intel.com,
	david@kernel.org,  ira.weiny@intel.com, jmattson@google.com,
	jroedel@suse.de,  jthoughton@google.com, michael.roth@amd.com,
	oupton@kernel.org,  pankaj.gupta@amd.com, qperret@google.com,
	rick.p.edgecombe@intel.com,  rientjes@google.com,
	shivankg@amd.com, steven.price@arm.com, tabba@google.com,
	 willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
	 forkloop@google.com, pratyush@kernel.org,
	suzuki.poulose@arm.com,  aneesh.kumar@kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	 Shuah Khan <shuah@kernel.org>,
	Vishal Annapurve <vannapurve@google.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	 Vlastimil Babka <vbabka@kernel.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC v3 00/43] guest_memfd: In-place conversion support
Date: Fri, 13 Mar 2026 08:45:27 -0700	[thread overview]
Message-ID: <abQxF2Gbd7sSsCcq@google.com> (raw)
In-Reply-To: <20260313-gmem-inplace-conversion-v3-0-5fc12a70ec89@google.com>

On Fri, Mar 13, 2026, Ackerley Tng wrote:
> Hi,
> 
> (Here's the motivation for this series, which I realized was missing from
> the earlier revisions of this series)

...

> I'm intending RFC (v3) as a basis for discussion of flags/content
> modes (name TBD) to allow userspace to request guarantees on how the memory
> contents will look like after setting memory attributes. The last 6 patches
> implement content mode support. These patches will be reordered, and some
> of them could be absorbed into earlier patches, in later revisions.
> 
> Here are the discussion points I can think of (please add on):
> 
> 1. (Might hopefully resolve soon?) Should ZERO be supported on shared to
>    private conversions? Discussion is at [6].

No.  There is no use case.  The entire point of CoCo is that the VMM is untrusted.
Having the guest rely on the VMM to zero memory makes no sense whatsoever.  There
may be a contract between the trusted whatever and the guest, but that's between
those two entities, the VMM is not involved, period.

PRESERVE is different because the intent is to allow the guest to operate on
*untrusted* data.  Operating on untrusted zeros is nonsensical.

ZERO for private=>shared is different between the VMM trusts the host kernel.

> 2. Do we need a CAP for userspace to query the flags/modes supported?

Yes.

>    It seems like there won't be anything dynamic about the flags/modes
>    supported.
> 
>    The userspace code can check what platform it is running on, and then
>    decide ZERO or PRESERVE based on the platform:
> 
>    If the VM is running on TDX,

No.  No, no, no, no.  I have said this over, and over, and over.  The contract
is between userspace and KVM, not between userspace and the underlying CoCo
implementation.  Anything that requires making assumptions based on the VM type
is a non-starter for me.

>    it would want to specify ZERO all the
>    time. If the VM were running on pKVM it would want to specify PRESERVE
>    if it wants to enable in-place sharing, and ZERO if it wants to zero the
>    memory.
> 
>    If someday TDX supports PRESERVE, then there's room for discovery of
>    which algorithm to choose when running the guest. Perhaps that's when
>    the CAP should be introduced?
> 
> 3. What do people think of the structure of how various content modes are
>    checked for support or applied? I used overridable weak functions for
>    architectures that haven't defined support, and defined overrides for
>    x86 to show how I think it would work. For CoCo platforms, I only
>    implemented TDX for illustration purposes and might need help with the
>    other platforms. Should I have used kvm_x86_ops? I tried and found
>    myself defining lots of boilerplate.
> 
> 4. enum for ZERO and PRESERVE?
> 
>    Pros:
> 
>    * No way to define both ZERO and PRESERVE (make impossible states
>      unrepresentable)
>        * e.g. enum kvm_device_type in __u32 type in struct
>          kvm_create_device
>        * But maybe someday some modes can be used together?

Huh?  Oh, you don't mean "enum", you mean "values vs. flags".  Because in C you
can obviously have an enum of flags.

I don't have a strong preference, though I think I'd vote for flags.

Practically speaking, I doubt we'll ever have more than DEFAULT, ZERO, and PRESERVE,
i.e. more than '0', '1, and '2'.  Perhaps I lack imagination, but I can't think
of any operation that we would want to become ABI.  ZERO is special purely because
various CoCo implementations already zero memory on conversion.  Everything else
fits into PRESERVE, because if the kernel perform the operation, then userspace
can do the same, and likely more performantly and obviously without needing a
contract with KVM.

The only other option I can think of is if a CoCo implementation wanted to use an
specific value other than '0' to fill a page on conversion.  Given that starting
from '0' is by far the most common state in computing, I just don't see that
happening.  E.g. that's be like adding k1salloc() in addition to kmalloc() and
kzalloc().

So, we're likely only going to have DEFAULT, ZERO, and PRESERVE, at which point
whether we use flags or values is a wash in terms of how many bits we need: 2.

If we use flags, then we can have a single CAP to enumerate all FLAGS that are
supported KVM_SET_MEMORY_ATTRIBUTES2.  If we use values, we'd need a separate CAP
for flags and a separate cap for conversion operations.

Using values would allow providing a dedicated field in kvm_memory_attributes2,
which _might_ make some code more readable.  But for me, that doesn't outweigh the
disadvantage of needing another CAP.

      parent reply	other threads:[~2026-03-13 15:45 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-13  6:12 [PATCH RFC v3 00/43] guest_memfd: In-place conversion support Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 01/43] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 02/43] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 03/43] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 04/43] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 06/43] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 07/43] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 08/43] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 09/43] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 10/43] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 11/43] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 12/43] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 13/43] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 14/43] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 15/43] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 16/43] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 17/43] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 18/43] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 19/43] KVM: selftests: Test using guest_memfd for guest private memory Ackerley Tng
2026-03-13  6:12 ` [PATCH RFC v3 20/43] KVM: selftests: Test basic single-page conversion flow Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 21/43] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 22/43] KVM: selftests: Test indexing in guest_memfd Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 23/43] KVM: selftests: Test conversion before allocation Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 24/43] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 25/43] KVM: selftests: Test precision of conversion Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 26/43] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 27/43] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 28/43] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 29/43] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 30/43] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 31/43] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 32/43] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 33/43] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 34/43] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 35/43] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 36/43] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 37/43] KVM: selftests: Update private memory exits test work with per-gmem attributes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 38/43] KVM: guest_memfd: Introduce default handlers for content modes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 39/43] KVM: guest_memfd: Apply content modes while setting memory attributes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 40/43] KVM: x86: Add support for applying content modes Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 41/43] KVM: x86: Support content mode ZERO for TDX Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 42/43] KVM: selftests: Allow flags to be specified in set_memory_attributes functions Ackerley Tng
2026-03-13  6:13 ` [PATCH RFC v3 43/43] KVM: selftests: Update tests to use flag-enabled library functions Ackerley Tng
2026-03-13 15:45 ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abQxF2Gbd7sSsCcq@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=aik@amd.com \
    --cc=andrew.jones@linux.dev \
    --cc=aneesh.kumar@kernel.org \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=chao.p.peng@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=forkloop@google.com \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jmattson@google.com \
    --cc=jroedel@suse.de \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=oupton@kernel.org \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pratyush@kernel.org \
    --cc=qperret@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=shivankg@amd.com \
    --cc=shuah@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@kernel.org \
    --cc=vannapurve@google.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=wyihan@google.com \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox