linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] Direct Map Removal Support for guest_memfd
@ 2025-08-28  9:39 Roy, Patrick
  2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Roy, Patrick @ 2025-08-28  9:39 UTC (permalink / raw)
  To: david@redhat.com, seanjc@google.com
  Cc: Roy, Patrick, tabba@google.com, ackerleytng@google.com,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, rppt@kernel.org,
	will@kernel.org, vbabka@suse.cz, Cali, Marco, Kalyazin, Nikita,
	Thomson, Jack, Manwaring, Derek

[ based on kvm/next ]

Unmapping virtual machine guest memory from the host kernel's direct map is a
successful mitigation against Spectre-style transient execution issues: If the
kernel page tables do not contain entries pointing to guest memory, then any
attempted speculative read through the direct map will necessarily be blocked
by the MMU before any observable microarchitectural side-effects happen. This
means that Spectre-gadgets and similar cannot be used to target virtual machine
memory. Roughly 60% of speculative execution issues fall into this category [1,
Table 1].

This patch series extends guest_memfd with the ability to remove its memory
from the host kernel's direct map, to be able to attain the above protection
for KVM guests running inside guest_memfd.

=== Design ===

We build on top of guest_memfd's recent support for "non-confidential VMs", in
which all of guest_memfd is mappable to userspace (e.g. considered "shared").
For such VMs, all guest page faults are routed through guest_memfd's special
page fault handler, which due to consuming fd+offset directly, can map direct
map removed memory into the guest. KVM's internal accesses to guest memory are
handled by providing each memslot with a userspace mapping of that memslots
guest_memfd via userspace_addr. Since KVM's internal accesses are almost
exclusively handled via copy_from_user() and friends, this allows KVM to access
direct map removed guest memory for features such as MMIO instruction emulation
on x86 or pvtime support on ARM64.

=== Implementation ===

The KVM_CREATE_GUEST_MEMFD ioctl gains a new flag
GUEST_MEMFD_FLAG_NO_DIRECT_MAP.  If this flag is passed, then guest_memfd
removes direct map entries for its folios are preparation. Upon free-ing of the
memory, direct map entries are restored prior to gmem's arch specific
invalidation callback.

Support for the flag can be discovered via the KVM_CAP_GMEM_NO_DIRECT_MAP
capability, which is only available if direct map modifications at 4k
granularity is architecturally possible / when KVM can successfully map direct
map removed memory into the guest.

=== Testing ===

KVM selftests are extended to cover the above-described non-CoCo workflows,
where guest_memfd with direct map entries removed is used to back all of guest
memory, and exercising some simple MMIO paths.

Additionally, a Firecracker branch with support for these VMs can be found on
GitHub [2].

=== Changes since v4 ===

- Rebase on top of kvm/next
- Stop using PG_private to track direct map removal state
- fix build or KVM-as-a-module by using new EXPORT_SYMBOL_FOR_MODULES

=== FAQ ===

--- why not reuse memfd_secret() / a bespoke guest memory solution? ---

having guest memory be direct map removed means guest page faults cannot be
resolved by GUP-ing userspace mappings of guest memory, as GUP is disabled for
direct map removed memory (as currently GUP has no way to understand that a
specific GUP request will not subsequently dereference page_address()).
guest_memfd already has a special path inside KVM that instead consumed
fd+offset, so it makes sense to reuse this. Additionally, it means that
direct-map-removed VMs can benefit from active development on guest_memfd, such
as huge pages support.

--- why do KVM internal accesses through userspace page tables? ---

For traditional VMs, all KVM internal accesses are done through the
userspace_addr stored in a memslot, meaning no changes to most KVM code are
needed just to allow access to guest_memfd backed / direct map removed guest
memory of non-confidential VMs. Previous iterations of this series tried to
avoid userspace mappings, instead attempting to dynamically restore direct map
entries for internal accesses [RFCv2], but this turned out to have a
significant performance impact, as well as additional complexity due to needing
to refcount direct map reinsertion operations and making them play nicely with
gmem truncations.

--- what doesn't work with direct map removed VMs? ---

The only thing I'm aware of is kvm-clock, since it tries to GUP guest memory
via gfn_to_pfn_cache. Realistically, this is only a problem on AMD, as on Intel
guests can use TSC as a clocksource (Intel allows discovery of TSC frequency
via CPUID, while AMD doesn't).  AMD guests fall back onto some calibration
routine, which fails most of the time though.

[1]: https://download.vusec.net/papers/quarantine_raid23.pdf
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/
[RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/
[RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/
[v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/


Elliot Berman (1):
  filemap: Pass address_space mapping to ->free_folio()

Patrick Roy (11):
  arch: export set_direct_map_valid_noflush to KVM module
  mm: introduce AS_NO_DIRECT_MAP
  KVM: guest_memfd: Add flag to remove from direct map
  KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP
  KVM: selftests: load elf via bounce buffer
  KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd
    != -1
  KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
  KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion
    tests
  KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in
    guest_memfd_test.c
  KVM: selftests: Test guest execution from direct map removed gmem

 Documentation/filesystems/locking.rst         |  2 +-
 Documentation/virt/kvm/api.rst                |  5 ++
 arch/arm64/include/asm/kvm_host.h             | 12 ++++
 arch/arm64/mm/pageattr.c                      |  1 +
 arch/loongarch/mm/pageattr.c                  |  1 +
 arch/riscv/mm/pageattr.c                      |  1 +
 arch/s390/mm/pageattr.c                       |  1 +
 arch/x86/mm/pat/set_memory.c                  |  1 +
 fs/nfs/dir.c                                  | 11 ++--
 fs/orangefs/inode.c                           |  3 +-
 include/linux/fs.h                            |  2 +-
 include/linux/kvm_host.h                      |  7 +++
 include/linux/pagemap.h                       | 16 +++++
 include/linux/secretmem.h                     | 18 ------
 include/uapi/linux/kvm.h                      |  2 +
 lib/buildid.c                                 |  4 +-
 mm/filemap.c                                  |  9 +--
 mm/gup.c                                      | 14 +----
 mm/mlock.c                                    |  2 +-
 mm/secretmem.c                                |  9 +--
 mm/vmscan.c                                   |  4 +-
 .../testing/selftests/kvm/guest_memfd_test.c  |  2 +
 .../testing/selftests/kvm/include/kvm_util.h  | 37 ++++++++---
 .../testing/selftests/kvm/include/test_util.h |  8 +++
 tools/testing/selftests/kvm/lib/elf.c         |  8 +--
 tools/testing/selftests/kvm/lib/io.c          | 23 +++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 61 +++++++++++--------
 tools/testing/selftests/kvm/lib/test_util.c   |  8 +++
 tools/testing/selftests/kvm/lib/x86/sev.c     |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     |  1 +
 .../selftests/kvm/set_memory_region_test.c    | 50 +++++++++++++--
 .../kvm/x86/private_mem_conversions_test.c    |  7 ++-
 virt/kvm/guest_memfd.c                        | 32 ++++++++--
 virt/kvm/kvm_main.c                           |  5 ++
 34 files changed, 264 insertions(+), 104 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.50.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-09-02  9:55 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28  9:39 [PATCH v5 00/12] Direct Map Removal Support for guest_memfd Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 01/12] filemap: Pass address_space mapping to ->free_folio() Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 02/12] arch: export set_direct_map_valid_noflush to KVM module Roy, Patrick
2025-08-28 10:07   ` Fuad Tabba
2025-09-01 13:47     ` Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 03/12] mm: introduce AS_NO_DIRECT_MAP Roy, Patrick
2025-08-28 10:21   ` Fuad Tabba
2025-09-01 13:54     ` Roy, Patrick
2025-09-01 14:56       ` Roy, Patrick
2025-09-02  7:59         ` Fuad Tabba
2025-09-02  8:46           ` David Hildenbrand
2025-09-02  8:50             ` Fuad Tabba
2025-09-02  9:18               ` Roy, Patrick
2025-09-02  9:21                 ` Fuad Tabba
2025-09-02  9:54                   ` David Hildenbrand
2025-08-28 14:26   ` Mike Rapoport
2025-09-01 13:56     ` Roy, Patrick
2025-08-28 21:00   ` David Hildenbrand
2025-09-01 14:03     ` Roy, Patrick
2025-08-31 10:26   ` kernel test robot
2025-09-01 14:05     ` Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 04/12] KVM: guest_memfd: Add flag to remove from direct map Roy, Patrick
2025-08-28 14:54   ` Mike Rapoport
2025-09-01 14:22     ` Roy, Patrick
2025-09-01 14:27       ` Mike Rapoport
2025-08-28  9:39 ` [PATCH v5 05/12] KVM: Documentation: describe GUEST_MEMFD_FLAG_NO_DIRECT_MAP Roy, Patrick
2025-08-28 10:27   ` David Hildenbrand
2025-09-01 14:30     ` Roy, Patrick
2025-09-01 14:43       ` David Hildenbrand
2025-08-28  9:39 ` [PATCH v5 06/12] KVM: selftests: load elf via bounce buffer Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 07/12] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 08/12] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 09/12] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 10/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in mem conversion tests Roy, Patrick
2025-08-28  9:39 ` [PATCH v5 11/12] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in guest_memfd_test.c Roy, Patrick
2025-08-28 10:26   ` David Hildenbrand
2025-08-28  9:39 ` [PATCH v5 12/12] KVM: selftests: Test guest execution from direct map removed gmem Roy, Patrick
2025-08-28 12:50 ` [PATCH v5 00/12] Direct Map Removal Support for guest_memfd David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).