public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/37] guest_memfd: In-place conversion support
@ 2026-02-02 22:29 Ackerley Tng
  0 siblings, 0 replies; 3+ messages in thread
From: Ackerley Tng @ 2026-02-02 22:29 UTC (permalink / raw)
  To: kvm, linux-doc, linux-kernel, linux-kselftest, linux-trace-kernel,
	x86
  Cc: aik, andrew.jones, binbin.wu, bp, brauner, chao.p.peng,
	chao.p.peng, chenhuacai, corbet, dave.hansen, david, hpa,
	ira.weiny, jgg, jmattson, jroedel, jthoughton, maobibo,
	mathieu.desnoyers, maz, mhiramat, michael.roth, mingo, mlevitsk,
	oupton, pankaj.gupta, pbonzini, prsampat, qperret, ricarkol,
	rick.p.edgecombe, rientjes, rostedt, seanjc, shivankg, shuah,
	steven.price, tabba, tglx, vannapurve, vbabka, willy, wyihan,
	yan.y.zhao, Ackerley Tng

Here's a second revision of guest_memfd In-place conversion support.

In this version, other than addressing comments from RFCv1 [1], the largest
change is that guest_memfd now does not avoid participation in LRU; it
participates in LRU by joining the unevictable list (no change from before this
series).

While checking for elevated refcounts during shared to private conversions,
guest_memfd will now do an lru_add_drain_all() if elevated refcounts were found,
before concluding that there are true users of the shared folio and erroring
out.

I'd still like feedback on these points, if any:

1. Having private/shared status stored in a maple tree (Thanks Michael for your
   support of using maple trees over xarrays for performance! [5]).
2. Having a new guest_memfd ioctl (not a vm ioctl) that performs conversions.
3. Using ioctls/structs/input attribute similar to the existing vm ioctl
   KVM_SET_MEMORY_ATTRIBUTES to perform conversions.
4. Storing requested attributes directly in the maple tree.
5. Using a KVM module-wide param to toggle between setting memory attributes via
   vm and guest_memfd ioctls (making them mututally exclusive - a single loaded
   KVM module can only do one of the two.).

This series is based on kvm/next as at 2026-01-21, and here's the tree for your
convenience:

https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-conversion-v2

The "Don't set FGP_ACCESSED when getting folios" patch from RFCv1 is still
useful but no longer related to conversion, and was posted separately [6].

Older series:

+ RFCv1 is at [1]
+ Previous versions of this feature, part of other series, are available at
  [2][3][4].

[1] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/T/
[2] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.1726009989.git.ackerleytng@google.com/
[3] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
[4] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com/
[5] https://lore.kernel.org/all/20250529054227.hh2f4jmyqf6igd3i@amd.com/
[6] https://lore.kernel.org/all/20260129172646.2361462-1-ackerleytng@google.com/

Ackerley Tng (19):
  KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
  KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2
  KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
  KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion
    safety check
  KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2
  KVM: selftests: Test using guest_memfd for guest private memory
  KVM: selftests: Test basic single-page conversion flow
  KVM: selftests: Test conversion flow when INIT_SHARED
  KVM: selftests: Test indexing in guest_memfd
  KVM: selftests: Test conversion before allocation
  KVM: selftests: Convert with allocated folios in different layouts
  KVM: selftests: Test precision of conversion
  KVM: selftests: Test that truncation does not change shared/private
    status
  KVM: selftests: Test conversion with elevated page refcount
  KVM: selftests: Reset shared memory after hole-punching
  KVM: selftests: Provide function to look up guest_memfd details from
    gpa
  KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
  KVM: selftests: Update private_mem_conversions_test to mmap()
    guest_memfd
  KVM: selftests: Add script to exercise private_mem_conversions_test

Sean Christopherson (18):
  KVM: guest_memfd: Introduce per-gmem attributes, use to guard user
    mappings
  KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
  KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem
    is defined
  KVM: Stub in ability to disable per-VM memory attribute tracking
  KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem
    attributes
  KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
  KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
  KVM: Let userspace disable per-VM mem attributes, enable per-gmem
    attributes
  KVM: selftests: Create gmem fd before "regular" fd when adding memslot
  KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
  KVM: selftests: Add support for mmap() on guest_memfd in core library
  KVM: selftests: Add selftests global for guest memory attributes
    capability
  KVM: selftests: Add helpers for calling ioctls on guest_memfd
  KVM: selftests: Test that shared/private status is consistent across
    processes
  KVM: selftests: Provide common function to set memory attributes
  KVM: selftests: Check fd/flags provided to mmap() when setting up
    memslot
  KVM: selftests: Update pre-fault test to work with per-guest_memfd
    attributes
  KVM: selftests: Update private memory exits test work with per-gmem
    attributes

 Documentation/virt/kvm/api.rst                |  72 ++-
 arch/x86/include/asm/kvm_host.h               |   2 +-
 arch/x86/kvm/Kconfig                          |  15 +-
 arch/x86/kvm/mmu/mmu.c                        |   4 +-
 arch/x86/kvm/x86.c                            |  13 +-
 include/linux/kvm_host.h                      |  53 +-
 include/trace/events/kvm.h                    |   4 +-
 include/uapi/linux/kvm.h                      |  17 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../kvm/guest_memfd_conversions_test.c        | 486 ++++++++++++++++++
 .../testing/selftests/kvm/guest_memfd_test.c  |  57 +-
 .../testing/selftests/kvm/include/kvm_util.h  | 128 ++++-
 .../testing/selftests/kvm/include/test_util.h |  31 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    | 130 +++--
 tools/testing/selftests/kvm/lib/test_util.c   |   7 -
 .../selftests/kvm/pre_fault_memory_test.c     |   2 +-
 .../kvm/x86/private_mem_conversions_test.c    |  48 +-
 .../kvm/x86/private_mem_conversions_test.py   | 152 ++++++
 .../kvm/x86/private_mem_kvm_exits_test.c      |  36 +-
 virt/kvm/Kconfig                              |   4 +-
 virt/kvm/guest_memfd.c                        | 399 +++++++++++++-
 virt/kvm/kvm_main.c                           | 104 +++-
 23 files changed, 1590 insertions(+), 176 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_conversions_test.c
 create mode 100755 tools/testing/selftests/kvm/x86/private_mem_conversions_test.py

--
2.53.0.rc1.225.gd81095ad13-goog

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH v2 00/37] guest_memfd: In-place conversion support
@ 2026-02-02 22:36 Ackerley Tng
  2026-02-20  9:09 ` Lisa Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Ackerley Tng @ 2026-02-02 22:36 UTC (permalink / raw)
  To: kvm, linux-doc, linux-kernel, linux-kselftest, linux-trace-kernel,
	x86
  Cc: aik, andrew.jones, binbin.wu, bp, brauner, chao.p.peng,
	chao.p.peng, chenhuacai, corbet, dave.hansen, david, hpa,
	ira.weiny, jgg, jmattson, jroedel, jthoughton, maobibo,
	mathieu.desnoyers, maz, mhiramat, michael.roth, mingo, mlevitsk,
	oupton, pankaj.gupta, pbonzini, prsampat, qperret, ricarkol,
	rick.p.edgecombe, rientjes, rostedt, seanjc, shivankg, shuah,
	steven.price, tabba, tglx, vannapurve, vbabka, willy, wyihan,
	yan.y.zhao, Ackerley Tng

(resending to fix Message-ID)

Here's a second revision of guest_memfd In-place conversion support.

In this version, other than addressing comments from RFCv1 [1], the largest
change is that guest_memfd now does not avoid participation in LRU; it
participates in LRU by joining the unevictable list (no change from before this
series).

While checking for elevated refcounts during shared to private conversions,
guest_memfd will now do an lru_add_drain_all() if elevated refcounts were found,
before concluding that there are true users of the shared folio and erroring
out.

I'd still like feedback on these points, if any:

1. Having private/shared status stored in a maple tree (Thanks Michael for your
   support of using maple trees over xarrays for performance! [5]).
2. Having a new guest_memfd ioctl (not a vm ioctl) that performs conversions.
3. Using ioctls/structs/input attribute similar to the existing vm ioctl
   KVM_SET_MEMORY_ATTRIBUTES to perform conversions.
4. Storing requested attributes directly in the maple tree.
5. Using a KVM module-wide param to toggle between setting memory attributes via
   vm and guest_memfd ioctls (making them mututally exclusive - a single loaded
   KVM module can only do one of the two.).

This series is based on kvm/next as at 2026-01-21, and here's the tree for your
convenience:

https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-conversion-v2

The "Don't set FGP_ACCESSED when getting folios" patch from RFCv1 is still
useful but no longer related to conversion, and was posted separately [6].

Older series:

+ RFCv1 is at [1]
+ Previous versions of this feature, part of other series, are available at
  [2][3][4].

[1] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/T/
[2] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.1726009989.git.ackerleytng@google.com/
[3] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
[4] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com/
[5] https://lore.kernel.org/all/20250529054227.hh2f4jmyqf6igd3i@amd.com/
[6] https://lore.kernel.org/all/20260129172646.2361462-1-ackerleytng@google.com/

Ackerley Tng (19):
  KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
  KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2
  KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
  KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion
    safety check
  KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2
  KVM: selftests: Test using guest_memfd for guest private memory
  KVM: selftests: Test basic single-page conversion flow
  KVM: selftests: Test conversion flow when INIT_SHARED
  KVM: selftests: Test indexing in guest_memfd
  KVM: selftests: Test conversion before allocation
  KVM: selftests: Convert with allocated folios in different layouts
  KVM: selftests: Test precision of conversion
  KVM: selftests: Test that truncation does not change shared/private
    status
  KVM: selftests: Test conversion with elevated page refcount
  KVM: selftests: Reset shared memory after hole-punching
  KVM: selftests: Provide function to look up guest_memfd details from
    gpa
  KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
  KVM: selftests: Update private_mem_conversions_test to mmap()
    guest_memfd
  KVM: selftests: Add script to exercise private_mem_conversions_test

Sean Christopherson (18):
  KVM: guest_memfd: Introduce per-gmem attributes, use to guard user
    mappings
  KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
  KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem
    is defined
  KVM: Stub in ability to disable per-VM memory attribute tracking
  KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem
    attributes
  KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
  KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
  KVM: Let userspace disable per-VM mem attributes, enable per-gmem
    attributes
  KVM: selftests: Create gmem fd before "regular" fd when adding memslot
  KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
  KVM: selftests: Add support for mmap() on guest_memfd in core library
  KVM: selftests: Add selftests global for guest memory attributes
    capability
  KVM: selftests: Add helpers for calling ioctls on guest_memfd
  KVM: selftests: Test that shared/private status is consistent across
    processes
  KVM: selftests: Provide common function to set memory attributes
  KVM: selftests: Check fd/flags provided to mmap() when setting up
    memslot
  KVM: selftests: Update pre-fault test to work with per-guest_memfd
    attributes
  KVM: selftests: Update private memory exits test work with per-gmem
    attributes

 Documentation/virt/kvm/api.rst                |  72 ++-
 arch/x86/include/asm/kvm_host.h               |   2 +-
 arch/x86/kvm/Kconfig                          |  15 +-
 arch/x86/kvm/mmu/mmu.c                        |   4 +-
 arch/x86/kvm/x86.c                            |  13 +-
 include/linux/kvm_host.h                      |  53 +-
 include/trace/events/kvm.h                    |   4 +-
 include/uapi/linux/kvm.h                      |  17 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../kvm/guest_memfd_conversions_test.c        | 486 ++++++++++++++++++
 .../testing/selftests/kvm/guest_memfd_test.c  |  57 +-
 .../testing/selftests/kvm/include/kvm_util.h  | 128 ++++-
 .../testing/selftests/kvm/include/test_util.h |  31 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    | 130 +++--
 tools/testing/selftests/kvm/lib/test_util.c   |   7 -
 .../selftests/kvm/pre_fault_memory_test.c     |   2 +-
 .../kvm/x86/private_mem_conversions_test.c    |  48 +-
 .../kvm/x86/private_mem_conversions_test.py   | 152 ++++++
 .../kvm/x86/private_mem_kvm_exits_test.c      |  36 +-
 virt/kvm/Kconfig                              |   4 +-
 virt/kvm/guest_memfd.c                        | 399 +++++++++++++-
 virt/kvm/kvm_main.c                           | 104 +++-
 23 files changed, 1590 insertions(+), 176 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_conversions_test.c
 create mode 100755 tools/testing/selftests/kvm/x86/private_mem_conversions_test.py

--
2.53.0.rc1.225.gd81095ad13-goog

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH v2 00/37] guest_memfd: In-place conversion support
  2026-02-02 22:36 Ackerley Tng
@ 2026-02-20  9:09 ` Lisa Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Lisa Wang @ 2026-02-20  9:09 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kvm, linux-doc, linux-kernel, linux-kselftest, linux-trace-kernel,
	x86, aik, andrew.jones, binbin.wu, bp, brauner, chao.p.peng,
	chao.p.peng, chenhuacai, corbet, dave.hansen, david, hpa,
	ira.weiny, jgg, jmattson, jroedel, jthoughton, maobibo,
	mathieu.desnoyers, maz, mhiramat, michael.roth, mingo, mlevitsk,
	oupton, pankaj.gupta, pbonzini, prsampat, qperret, ricarkol,
	rick.p.edgecombe, rientjes, rostedt, seanjc, shivankg, shuah,
	steven.price, tabba, tglx, vannapurve, vbabka, willy, yan.y.zhao

On Mon, Feb 02, 2026 at 02:36:37PM -0800, Ackerley Tng wrote:
> (resending to fix Message-ID)
> 
> Here's a second revision of guest_memfd In-place conversion support.
> 
> In this version, other than addressing comments from RFCv1 [1], the largest
> change is that guest_memfd now does not avoid participation in LRU; it
> participates in LRU by joining the unevictable list (no change from before this
> series).
> 
> While checking for elevated refcounts during shared to private conversions,
> guest_memfd will now do an lru_add_drain_all() if elevated refcounts were found,
> before concluding that there are true users of the shared folio and erroring
> out.
> 
> I'd still like feedback on these points, if any:
> 
> 1. Having private/shared status stored in a maple tree (Thanks Michael for your
>    support of using maple trees over xarrays for performance! [5]).
> 2. Having a new guest_memfd ioctl (not a vm ioctl) that performs conversions.
> 3. Using ioctls/structs/input attribute similar to the existing vm ioctl
>    KVM_SET_MEMORY_ATTRIBUTES to perform conversions.
> 4. Storing requested attributes directly in the maple tree.
> 5. Using a KVM module-wide param to toggle between setting memory attributes via
>    vm and guest_memfd ioctls (making them mututally exclusive - a single loaded
>    KVM module can only do one of the two.).
> 
> [...snip...]
>
> 
> --
> 2.53.0.rc1.225.gd81095ad13-goog

I’ve tested memory failure handling after applying this series and here’s what
memory_failure() does:

Shared memory: In line with other in-memory filesystems, the memory_failure()
handler unmaps the page if it is currently mapped, and issues a SIGBUS
  - if memory failure was injected with MF_ACTION_REQUIRED or
  - if the test process’s memory corruption kill policy is PR_MCE_KILL_EARLY

Here’s the above, in table form:

| MF_ACTION_REQUIRED | Kill Policy         | Mapped | Dirty | Result: SIGBUS |
|--------------------|---------------------|--------|-------|----------------|
| false              | PR_MCE_KILL_EARLY   | true   | true  | true           |
| false              | PR_MCE_KILL_EARLY   | true   | false | false          |
| false              | PR_MCE_KILL_EARLY   | false  | true  | false          |
| false              | PR_MCE_KILL_EARLY   | false  | false | false          |
| false              | PR_MCE_KILL_LATE    | true   | true  | false          |
| false              | PR_MCE_KILL_LATE    | true   | false | false          |
| false              | PR_MCE_KILL_LATE    | false  | true  | false          |
| false              | PR_MCE_KILL_LATE    | false  | false | false          |
| true               | Any Policy          | true   | true  | true           |
| true               | Any Policy          | true   | false | false          |

(I used MADV_HWPOISON to inject memory failures with MF_ACTION_REQUIRED set, and
there was no way to use MADV_HWPOISON without first mapping the page in. To
inject memory failures without MF_ACTION_REQUIRED set, I used debugfs’
hwpoison/corrupt-pfn.)

Private memory: The handler unmaps the page for the stage 2 page table and does
not issue a SIGBUS - the page is never mapped to the host, since it is private
to the guest.

| MF_ACTION_REQUIRED | Kill Policy         | Mapped | Dirty | Result: SIGBUS |
|--------------------|---------------------|--------|-------|----------------|
| false              | PR_MCE_KILL_EARLY   | false  | true  | false          |
| false              | PR_MCE_KILL_EARLY   | false  | false | false          |
| false              | PR_MCE_KILL_LATE    | false  | true  | false          |
| false              | PR_MCE_KILL_LATE    | false  | false | false          |

(I couldn’t use MADV_HWPOISON since private memory could not be mapped and hence
will not have a userspace address)

I’ll post updated memory failure tests together with the next revision of this
series [1] to fix MF_DELAYED handling on memory failure.

[1] https://lore.kernel.org/all/cover.1760551864.git.wyihan@google.com/T/


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-02-20  9:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02 22:29 [RFC PATCH v2 00/37] guest_memfd: In-place conversion support Ackerley Tng
  -- strict thread matches above, loose matches on Subject: below --
2026-02-02 22:36 Ackerley Tng
2026-02-20  9:09 ` Lisa Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox