public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd
@ 2024-07-10 23:42 James Houghton
  2024-07-10 23:42 ` [RFC PATCH 01/18] KVM: Add KVM_USERFAULT build option James Houghton
                   ` (21 more replies)
  0 siblings, 22 replies; 40+ messages in thread
From: James Houghton @ 2024-07-10 23:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Sean Christopherson, Shuah Khan, Peter Xu,
	Axel Rasmussen, David Matlack, James Houghton, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm

This patch series implements the KVM-based demand paging system that was
first introduced back in November[1] by David Matlack.

The working name for this new system is KVM Userfault, but that name is
very confusing so it will not be the final name.

Problem: post-copy with guest_memfd
===================================

Post-copy live migration makes it possible to migrate VMs from one host
to another no matter how fast they are writing to memory while keeping
the VM paused for a minimal amount of time. For post-copy to work, we
need:
 1. to be able to prevent KVM from being able to access particular pages
    of guest memory until we have populated it
 2. for userspace to know when KVM is trying to access a particular
    page.
 3. a way to allow the access to proceed.

Traditionally, post-copy live migration is implemented using
userfaultfd, which hooks into the main mm fault path. KVM hits this path
when it is doing HVA -> PFN translations (with GUP) or when it itself
attempts to access guest memory. Userfaultfd sends a page fault
notification to userspace, and KVM goes to sleep.

Userfaultfd works well, as it is not specific to KVM; everyone who
attempts to access guest memory will block the same way.

However, with guest_memfd, we do not use GUP to translate from GFN to
HPA (nor is there an intermediate HVA).

So userfaultfd in its current form cannot be used to support post-copy
live migration with guest_memfd-backed VMs.

Solution: hook into the gfn -> pfn translation
==============================================

The only way to implement post-copy with a non-KVM-specific
userfaultfd-like system would be to introduce the concept of a
file-userfault[2] to intercept faults on a guest_memfd.

Instead, we take the simpler approach of adding a KVM-specific API, and
we hook into the GFN -> HVA or GFN -> PFN translation steps (for
traditional memslots and for guest_memfd respectively).

I have intentionally added support for traditional memslots, as the
complexity that it adds is minimal, and it is useful for some VMMs, as
it can be used to fully implement post-copy live migration.

Implementation Details
======================

Let's break down how KVM implements each of the three core requirements
for implementing post-copy as laid out above:

--- Preventing access: KVM_MEMORY_ATTRIBUTE_USERFAULT ---

The most straightforward way to inform KVM of userfault-enabled pages is
to use a new memory attribute, say KVM_MEMORY_ATTRIBUTE_USERFAULT.

There is already infrastructure in place for modifying and checking
memory attributes. Using this interface is slightly challenging, as there
is no UAPI for setting/clearing particular attributes; we must set the
exact attributes we want.

The synchronization that is in place for updating memory attributes is
not suitable for post-copy live migration either, which will require
updating memory attributes (from userfault to no-userfault) very
frequently.

Another potential interface could be to use something akin to a dirty
bitmap, where a bitmap describes which pages within a memslot (or VM)
should trigger userfaults. This way, it is straightforward to make
updates to the userfault status of a page cheap.

When KVM Userfault is enabled, we need to be careful not to map a
userfault page in response to a fault on a non-userfault page. In this
RFC, I've taken the simplest approach: force new PTEs to be PAGE_SIZE.

--- Page fault notifications ---

For page faults generated by vCPUs running in guest mode, if the page
the vCPU is trying to access is a userfault-enabled page, we use
KVM_EXIT_MEMORY_FAULT with a new flag: KVM_MEMORY_EXIT_FLAG_USERFAULT.

For arm64, I believe this is actually all we need, provided we handle
steal_time properly.

For x86, where returning from deep within the instruction emulator (or
other non-trivial execution paths) is infeasible, being able to pause
execution while userspace fetches the page, just as userfaultfd would
do, is necessary. Let's call these "asynchronous userfaults."

A new ioctl, KVM_READ_USERFAULT, has been added to read asynchronous
userfaults, and an eventfd is used to signal that new faults are
available for reading.

Today, we busy-wait for a gfn to have userfault disabled. This will
change in the future.

--- Fault resolution ---

Resolving userfaults today is as simple as removing the USERFAULT memory
attribute on the faulting gfn. This will change if we do not end up
using memory attributes for KVM Userfault. Having a range-based wake-up
like userfaultfd (see UFFDIO_WAKE) might also be helpful for
performance.

Problems with this series
=========================
- This cannot be named KVM Userfault! Perhaps "KVM missing pages"?
- Memory attribute modification doesn't scale well.
- We busy-wait for pages to not be userfault-enabled.
- gfn_to_hva and gfn_to_pfn caches are not invalidated.
- Page tables are not collapsed when KVM Userfault is disabled.
- There is no self-test for asynchronous userfaults.
- Asynchronous page faults can be dropped if KVM_READ_USERFAULT fails.
- Supports only x86 and arm64.
- Probably many more!

Thanks!

[1]: https://lore.kernel.org/kvm/CALzav=d23P5uE=oYqMpjFohvn0CASMJxXB_XEOEi-jtqWcFTDA@mail.gmail.com/
[2]: https://lore.kernel.org/kvm/CADrL8HVwBjLpWDM9i9Co1puFWmJshZOKVu727fMPJUAbD+XX5g@mail.gmail.com/

James Houghton (18):
  KVM: Add KVM_USERFAULT build option
  KVM: Add KVM_CAP_USERFAULT and KVM_MEMORY_ATTRIBUTE_USERFAULT
  KVM: Put struct kvm pointer in memslot
  KVM: Fail __gfn_to_hva_many for userfault gfns.
  KVM: Add KVM_PFN_ERR_USERFAULT
  KVM: Add KVM_MEMORY_EXIT_FLAG_USERFAULT
  KVM: Provide attributes to kvm_arch_pre_set_memory_attributes
  KVM: x86: Add KVM Userfault support
  KVM: x86: Add vCPU fault fast-path for Userfault
  KVM: arm64: Add KVM Userfault support
  KVM: arm64: Add vCPU memory fault fast-path for Userfault
  KVM: arm64: Add userfault support for steal-time
  KVM: Add atomic parameter to __gfn_to_hva_many
  KVM: Add asynchronous userfaults, KVM_READ_USERFAULT
  KVM: guest_memfd: Add KVM Userfault support
  KVM: Advertise KVM_CAP_USERFAULT in KVM_CHECK_EXTENSION
  KVM: selftests: Add KVM Userfault mode to demand_paging_test
  KVM: selftests: Remove restriction in vm_set_memory_attributes

 Documentation/virt/kvm/api.rst                |  23 ++
 arch/arm64/include/asm/kvm_host.h             |   2 +-
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/arm.c                          |   8 +-
 arch/arm64/kvm/mmu.c                          |  45 +++-
 arch/arm64/kvm/pvtime.c                       |  11 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/mmu/mmu.c                        |  67 +++++-
 arch/x86/kvm/mmu/mmu_internal.h               |   3 +-
 include/linux/kvm_host.h                      |  41 +++-
 include/uapi/linux/kvm.h                      |  13 ++
 .../selftests/kvm/demand_paging_test.c        |  46 +++-
 .../testing/selftests/kvm/include/kvm_util.h  |   7 -
 virt/kvm/Kconfig                              |   4 +
 virt/kvm/guest_memfd.c                        |  16 +-
 virt/kvm/kvm_main.c                           | 213 +++++++++++++++++-
 16 files changed, 457 insertions(+), 44 deletions(-)


base-commit: 02b0d3b9d4dd1ef76b3e8c63175f1ae9ff392313
-- 
2.45.2.993.g49e7a77208-goog



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-08-01 22:23 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 23:42 [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd James Houghton
2024-07-10 23:42 ` [RFC PATCH 01/18] KVM: Add KVM_USERFAULT build option James Houghton
2024-07-10 23:42 ` [RFC PATCH 02/18] KVM: Add KVM_CAP_USERFAULT and KVM_MEMORY_ATTRIBUTE_USERFAULT James Houghton
2024-07-15 21:37   ` Anish Moorthy
2024-07-10 23:42 ` [RFC PATCH 03/18] KVM: Put struct kvm pointer in memslot James Houghton
2024-07-10 23:42 ` [RFC PATCH 04/18] KVM: Fail __gfn_to_hva_many for userfault gfns James Houghton
2024-07-11 23:40   ` David Matlack
2024-07-10 23:42 ` [RFC PATCH 05/18] KVM: Add KVM_PFN_ERR_USERFAULT James Houghton
2024-07-10 23:42 ` [RFC PATCH 06/18] KVM: Add KVM_MEMORY_EXIT_FLAG_USERFAULT James Houghton
2024-07-10 23:42 ` [RFC PATCH 07/18] KVM: Provide attributes to kvm_arch_pre_set_memory_attributes James Houghton
2024-07-10 23:42 ` [RFC PATCH 08/18] KVM: x86: Add KVM Userfault support James Houghton
2024-07-17 15:34   ` Wang, Wei W
2024-07-18 17:08     ` James Houghton
2024-07-19 14:44       ` Wang, Wei W
2024-07-10 23:42 ` [RFC PATCH 09/18] KVM: x86: Add vCPU fault fast-path for Userfault James Houghton
2024-07-10 23:42 ` [RFC PATCH 10/18] KVM: arm64: Add KVM Userfault support James Houghton
2024-07-10 23:42 ` [RFC PATCH 11/18] KVM: arm64: Add vCPU memory fault fast-path for Userfault James Houghton
2024-07-10 23:42 ` [RFC PATCH 12/18] KVM: arm64: Add userfault support for steal-time James Houghton
2024-07-10 23:42 ` [RFC PATCH 13/18] KVM: Add atomic parameter to __gfn_to_hva_many James Houghton
2024-07-10 23:42 ` [RFC PATCH 14/18] KVM: Add asynchronous userfaults, KVM_READ_USERFAULT James Houghton
2024-07-11 23:52   ` David Matlack
2024-07-26 16:50   ` Nikita Kalyazin
2024-07-26 18:00     ` James Houghton
2024-07-29 17:17       ` Nikita Kalyazin
2024-07-29 21:09         ` James Houghton
2024-08-01 22:22           ` Peter Xu
2024-07-10 23:42 ` [RFC PATCH 15/18] KVM: guest_memfd: Add KVM Userfault support James Houghton
2024-07-10 23:42 ` [RFC PATCH 16/18] KVM: Advertise KVM_CAP_USERFAULT in KVM_CHECK_EXTENSION James Houghton
2024-07-10 23:42 ` [RFC PATCH 17/18] KVM: selftests: Add KVM Userfault mode to demand_paging_test James Houghton
2024-07-10 23:42 ` [RFC PATCH 18/18] KVM: selftests: Remove restriction in vm_set_memory_attributes James Houghton
2024-07-10 23:48 ` [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd James Houghton
2024-08-01 22:12   ` Peter Xu
2024-07-11 17:54 ` James Houghton
2024-07-11 23:37 ` David Matlack
2024-07-18  1:59   ` James Houghton
2024-07-15 15:25 ` Wang, Wei W
2024-07-16 17:10   ` James Houghton
2024-07-17 15:03     ` Wang, Wei W
2024-07-18  1:09       ` James Houghton
2024-07-19 14:47         ` Wang, Wei W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox