From: Ackerley Tng <ackerleytng@google.com>
To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com,
brauner@kernel.org, chao.p.peng@linux.intel.com,
david@kernel.org, ira.weiny@intel.com, jmattson@google.com,
jroedel@suse.de, jthoughton@google.com, michael.roth@amd.com,
oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com,
rick.p.edgecombe@intel.com, rientjes@google.com,
shivankg@amd.com, steven.price@arm.com, tabba@google.com,
willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com,
forkloop@google.com, pratyush@kernel.org,
suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Shuah Khan <shuah@kernel.org>,
Vishal Annapurve <vannapurve@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Vlastimil Babka <vbabka@kernel.org>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
Ackerley Tng <ackerleytng@google.com>
Subject: [PATCH RFC v4 00/44] guest_memfd: In-place conversion support
Date: Thu, 26 Mar 2026 15:24:09 -0700 [thread overview]
Message-ID: <20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com> (raw)
This is RFC v4 of guest_memfd in-place conversion support.
Up till now, guest_memfd supports the entire inode worth of memory being
used as all-shared, or all-private. CoCo VMs may request guest memory to be
converted between private and shared states, and the only way to support
that currently would be to have the userspace VMM provide two sources of
backing memory from completely different areas of physical memory.
pKVM has a use case for in-place sharing: the guest and host may be
cooperating on given data, and pKVM doesn't protect data through
encryption, so copying that given data between different areas of physical
memory as part of conversions would be unnecessary work.
This series also serves as a foundation for guest_memfd huge page
support. Now, guest_memfd only supports PAGE_SIZE pages, so if two sources
of backing memory are used, the userspace VMM could maintain a steady total
memory utilized by punching out the pages that are not used. When huge
pages are available in guest_memfd, even if the backing memory source
supports hole punching within a huge page, punching out pages to maintain
the total memory utilized by a VM would be introducing lots of
fragmentation.
In-place conversion avoids fragmentation by allowing the same physical
memory to be used for both shared and private memory, with guest_memfd
tracks the shared/private status of all the pages at a per-page
granularity.
The central principle, which guest_memfd continues to uphold, is that any
guest-private page will not be mappable to host userspace. All pages will
be mmap()-able in host userspace, but accesses to guest-private pages (as
tracked by guest_memfd) will result in a SIGBUS.
This series introduces a guest_memfd ioctl (not kvm, vm or vcpu, but
guest_memfd ioctl) that allows userspace to set memory
attributes (shared/private) directly through the guest_memfd. This is the
appropriate interface because shared/private-ness is a property of memory
and hence the request should be sent directly to the memory provider -
guest_memfd.
RFC v4 integrates comments from RFC v3:
+ ZERO is not supported on shared to private conversions
+ Adds KVM_CAP_GUEST_MEMFD_SET_MEMORY_ATTRIBUTES2_FLAGS to enumerate
supported content modes for a given VM, or all supported content modes if
no VM is provided
+ Uses flags and not values to specify content modes for conversion
+ Allows architectures to override the content mode application for the
entire range rather than per-folio: so if actions can be skipped, folio
iteration can be skipped entirely.
+ Addresses comments from Sashiko [7]
I would like feedback on:
+ Content modes: 0 (MODE_UNSPECIFIED), ZERO, and PRESERVE. Is that all
good, or does anyone think there is a use case for something else?
+ Should the content modes apply even if no attribute changes are required?
+ See notes added in "KVM: guest_memfd: Apply content modes while
setting memory attributes"
+ Possibly related: should setting attributes be allowed if some
sub-range requested already has the requested attribute?
+ Structure of how various content modes are checked for support or
applied? I used overridable weak functions for architectures that haven't
defined support, and defined overrides for x86 to show how I think it would
work. For CoCo platforms, I only implemented TDX for illustration purposes
and might need help with the other platforms. Should I have used
kvm_x86_ops? I tried and found myself defining lots of boilerplate.
+ The use of private_mem_conversions_test.sh to run different options in
private_mem_conversions_test. If this makes sense, I'll adjust the
Makefile to have private_mem_conversions_test tested only via the script.
TODOs
+ Address locking issue when kvm_gmem_get_attribute() is called from
kvm_mmu_zap_collapsible_spte(). In this path, KVM's MMU lock is held
while guest_memfd tries to take filemap_invalidate_lock while looking up
the attributes xarray.
+ Move guest_memfd_conversions_test.c to only be compiled and tested for
x86, since it depends so heavily on KVM_X86_SW_PROTECTED_VM's as a
testing vehicle
This series is based on kvm/next, and here's the tree for your convenience:
https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-conversion-v4
Older series:
+ RFCv3 is at [6]
+ RFCv2 is at [5]
+ RFCv1 is at [4]
+ Previous versions of this feature, part of other series, are available at
[1][2][3].
[1] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.1726009989.git.ackerleytng@google.com/
[2] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
[3] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com/
[4] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/T/
[5] https://lore.kernel.org/all/cover.1770071243.git.ackerleytng@google.com/T/
[6] https://lore.kernel.org/r/20260313-gmem-inplace-conversion-v3-0-5fc12a70ec89@google.com
[7] https://sashiko.dev/#/patchset/20260313-gmem-inplace-conversion-v3-0-5fc12a70ec89%40google.com
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
Ackerley Tng (26):
KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
KVM: guest_memfd: Only prepare folios for private pages
KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2
KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check
KVM: guest_memfd: Introduce default handlers for content modes
KVM: guest_memfd: Apply content modes while setting memory attributes
KVM: x86: Add support for applying content modes
KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags
KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2
KVM: selftests: Test using guest_memfd for guest private memory
KVM: selftests: Test basic single-page conversion flow
KVM: selftests: Test conversion flow when INIT_SHARED
KVM: selftests: Test conversion precision in guest_memfd
KVM: selftests: Test conversion before allocation
KVM: selftests: Convert with allocated folios in different layouts
KVM: selftests: Test that truncation does not change shared/private status
KVM: selftests: Test conversion with elevated page refcount
KVM: selftests: Test that conversion to private does not support ZERO
KVM: selftests: Support checking that data not equal expected
KVM: selftests: Test that not specifying a conversion flag scrambles memory contents
KVM: selftests: Reset shared memory after hole-punching
KVM: selftests: Provide function to look up guest_memfd details from gpa
KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
KVM: selftests: Add script to exercise private_mem_conversions_test
Sean Christopherson (18):
KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings
KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined
KVM: Stub in ability to disable per-VM memory attribute tracking
KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes
KVM: selftests: Create gmem fd before "regular" fd when adding memslot
KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
KVM: selftests: Add support for mmap() on guest_memfd in core library
KVM: selftests: Add selftests global for guest memory attributes capability
KVM: selftests: Add helpers for calling ioctls on guest_memfd
KVM: selftests: Test that shared/private status is consistent across processes
KVM: selftests: Provide common function to set memory attributes
KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes
KVM: selftests: Update private memory exits test to work with per-gmem attributes
Documentation/virt/kvm/api.rst | 136 ++++-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 15 +-
arch/x86/kvm/mmu/mmu.c | 4 +-
arch/x86/kvm/x86.c | 114 ++++-
include/linux/kvm_host.h | 77 ++-
include/trace/events/kvm.h | 4 +-
include/uapi/linux/kvm.h | 22 +
mm/swap.c | 2 +
tools/testing/selftests/kvm/Makefile.kvm | 5 +
.../selftests/kvm/guest_memfd_conversions_test.c | 552 ++++++++++++++++++++
tools/testing/selftests/kvm/guest_memfd_test.c | 57 ++-
tools/testing/selftests/kvm/include/kvm_util.h | 144 +++++-
tools/testing/selftests/kvm/include/test_util.h | 34 +-
.../selftests/kvm/kvm_has_gmem_attributes.c | 17 +
tools/testing/selftests/kvm/lib/kvm_util.c | 130 +++--
tools/testing/selftests/kvm/lib/test_util.c | 7 -
tools/testing/selftests/kvm/lib/x86/sev.c | 2 +-
.../testing/selftests/kvm/pre_fault_memory_test.c | 4 +-
.../kvm/x86/private_mem_conversions_test.c | 55 +-
.../kvm/x86/private_mem_conversions_test.sh | 128 +++++
.../selftests/kvm/x86/private_mem_kvm_exits_test.c | 38 +-
virt/kvm/Kconfig | 3 +-
virt/kvm/guest_memfd.c | 562 ++++++++++++++++++++-
virt/kvm/kvm_main.c | 116 ++++-
25 files changed, 2047 insertions(+), 183 deletions(-)
---
base-commit: d2ea4ff1ce50787a98a3900b3fb1636f3620b7cf
change-id: 20260225-gmem-inplace-conversion-bd0dbd39753a
Best regards,
--
Ackerley Tng <ackerleytng@google.com>
next reply other threads:[~2026-03-26 22:24 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 22:24 Ackerley Tng [this message]
2026-03-26 22:24 ` [PATCH RFC v4 01/44] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 02/44] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 03/44] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 04/44] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 05/44] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 06/44] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 07/44] KVM: guest_memfd: Only prepare folios for private pages Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 08/44] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 09/44] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 11/44] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 12/44] KVM: guest_memfd: Introduce default handlers for content modes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 13/44] KVM: guest_memfd: Apply content modes while setting memory attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 14/44] KVM: x86: Add support for applying content modes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 15/44] KVM: Add CAP to enumerate supported SET_MEMORY_ATTRIBUTES2 flags Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 16/44] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 17/44] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 18/44] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 19/44] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 20/44] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 21/44] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 22/44] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 23/44] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 24/44] KVM: selftests: Test using guest_memfd for guest private memory Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 25/44] KVM: selftests: Test basic single-page conversion flow Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 26/44] KVM: selftests: Test conversion flow when INIT_SHARED Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 27/44] KVM: selftests: Test conversion precision in guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 28/44] KVM: selftests: Test conversion before allocation Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 29/44] KVM: selftests: Convert with allocated folios in different layouts Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 30/44] KVM: selftests: Test that truncation does not change shared/private status Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 31/44] KVM: selftests: Test that shared/private status is consistent across processes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 32/44] KVM: selftests: Test conversion with elevated page refcount Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 33/44] KVM: selftests: Test that conversion to private does not support ZERO Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 34/44] KVM: selftests: Support checking that data not equal expected Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 35/44] KVM: selftests: Test that not specifying a conversion flag scrambles memory contents Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 36/44] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 37/44] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 38/44] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 39/44] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 40/44] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 41/44] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 42/44] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 43/44] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2026-03-26 22:24 ` [PATCH RFC v4 44/44] KVM: selftests: Update private memory exits test to work with per-gmem attributes Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 0/6] guest_memfd in-place conversion selftests for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 1/6] KVM: selftests: Initialize guest_memfd with INIT_SHARED Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 2/6] KVM: selftests: Call snp_launch_update_data() providing copy of memory Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 3/6] KVM: selftests: Make guest_code_xsave more friendly Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 4/6] KVM: selftests: Allow specifying CoCo-privateness while mapping a page Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 5/6] KVM: selftests: Test conversions for SNP Ackerley Tng
2026-03-26 23:36 ` [POC PATCH 6/6] KVM: selftests: Test content modes ZERO and PRESERVE " Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com \
--to=ackerleytng@google.com \
--cc=aik@amd.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=forkloop@google.com \
--cc=hpa@zytor.com \
--cc=ira.weiny@intel.com \
--cc=jgg@ziepe.ca \
--cc=jmattson@google.com \
--cc=jroedel@suse.de \
--cc=jthoughton@google.com \
--cc=kasong@tencent.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=nphamcs@gmail.com \
--cc=oupton@kernel.org \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=pratyush@kernel.org \
--cc=qperret@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shikemeng@huaweicloud.com \
--cc=shivankg@amd.com \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vannapurve@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=wyihan@google.com \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox