qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Subject: [PULL 16/31] i386/kvm: Prefault memory on page state change
Date: Fri,  6 Jun 2025 14:34:30 +0200	[thread overview]
Message-ID: <20250606123447.538131-17-pbonzini@redhat.com> (raw)
In-Reply-To: <20250606123447.538131-1-pbonzini@redhat.com>

From: Tom Lendacky <thomas.lendacky@amd.com>

A page state change is typically followed by an access of the page(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typically accepted in
4M chunks. A page state change request is submitted to mark the pages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.

When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
size of the page state change in order to pre-map the pages and avoid the
additional VMEXITs. This helps speed up boot times.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/system/kvm.h  |  1 +
 accel/kvm/kvm-all.c   |  2 ++
 target/i386/kvm/kvm.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/system/kvm.h b/include/system/kvm.h
index 62ec131d4d8..7cc60d26f24 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -42,6 +42,7 @@ extern bool kvm_gsi_routing_allowed;
 extern bool kvm_gsi_direct_mapping;
 extern bool kvm_readonly_mem_allowed;
 extern bool kvm_msi_use_devid;
+extern bool kvm_pre_fault_memory_supported;
 
 #define kvm_enabled()           (kvm_allowed)
 /**
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 51526d301b9..a31778341c2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -99,6 +99,7 @@ bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_vm_attributes_allowed;
 bool kvm_msi_use_devid;
+bool kvm_pre_fault_memory_supported;
 static bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
@@ -2745,6 +2746,7 @@ static int kvm_init(MachineState *ms)
         kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
         kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
         (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+    kvm_pre_fault_memory_supported = kvm_vm_check_extension(s, KVM_CAP_PRE_FAULT_MEMORY);
 
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
         s->kernel_irqchip_split = mc->default_kernel_irqchip_split ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a6bc089d020..56a6b9b6381 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6018,9 +6018,11 @@ static bool host_supports_vmx(void)
  * because private/shared page tracking is already provided through other
  * means, these 2 use-cases should be treated as being mutually-exclusive.
  */
-static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
+static int kvm_handle_hc_map_gpa_range(X86CPU *cpu, struct kvm_run *run)
 {
+    struct kvm_pre_fault_memory mem;
     uint64_t gpa, size, attributes;
+    int ret;
 
     if (!machine_require_guest_memfd(current_machine))
         return -EINVAL;
@@ -6031,13 +6033,32 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
 
     trace_kvm_hc_map_gpa_range(gpa, size, attributes, run->hypercall.flags);
 
-    return kvm_convert_memory(gpa, size, attributes & KVM_MAP_GPA_RANGE_ENCRYPTED);
+    ret = kvm_convert_memory(gpa, size, attributes & KVM_MAP_GPA_RANGE_ENCRYPTED);
+    if (ret || !kvm_pre_fault_memory_supported) {
+        return ret;
+    }
+
+    /*
+     * Opportunistically pre-fault memory in. Failures are ignored so that any
+     * errors in faulting in the memory will get captured in KVM page fault
+     * path when the guest first accesses the page.
+     */
+    memset(&mem, 0, sizeof(mem));
+    mem.gpa = gpa;
+    mem.size = size;
+    while (mem.size) {
+        if (kvm_vcpu_ioctl(CPU(cpu), KVM_PRE_FAULT_MEMORY, &mem)) {
+            break;
+        }
+    }
+
+    return 0;
 }
 
-static int kvm_handle_hypercall(struct kvm_run *run)
+static int kvm_handle_hypercall(X86CPU *cpu, struct kvm_run *run)
 {
     if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
-        return kvm_handle_hc_map_gpa_range(run);
+        return kvm_handle_hc_map_gpa_range(cpu, run);
 
     return -EINVAL;
 }
@@ -6137,7 +6158,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         break;
 #endif
     case KVM_EXIT_HYPERCALL:
-        ret = kvm_handle_hypercall(run);
+        ret = kvm_handle_hypercall(cpu, run);
         break;
     case KVM_EXIT_SYSTEM_EVENT:
         switch (run->system_event.type) {
-- 
2.49.0



  parent reply	other threads:[~2025-06-06 12:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 12:34 [PULL 00/31] Threading, Rust, i386 changes for 2025-06-06 Paolo Bonzini
2025-06-06 12:34 ` [PULL 01/31] subprojects: add the anyhow crate Paolo Bonzini
2025-06-06 12:34 ` [PULL 02/31] subprojects: add the foreign crate Paolo Bonzini
2025-06-06 12:34 ` [PULL 03/31] util/error: expose Error definition to Rust code Paolo Bonzini
2025-06-06 12:34 ` [PULL 04/31] util/error: allow non-NUL-terminated err->src Paolo Bonzini
2025-06-06 12:34 ` [PULL 05/31] util/error: make func optional Paolo Bonzini
2025-06-06 12:34 ` [PULL 06/31] rust: qemu-api: add bindings to Error Paolo Bonzini
2025-06-06 12:34 ` [PULL 07/31] rust: qemu-api: add tests for Error bindings Paolo Bonzini
2025-06-06 12:34 ` [PULL 08/31] rust: qdev: support returning errors from realize Paolo Bonzini
2025-06-06 12:34 ` [PULL 09/31] rust/hpet: change type of num_timers to usize Paolo Bonzini
2025-06-06 12:34 ` [PULL 10/31] hpet: adjust VMState for consistency with Rust version Paolo Bonzini
2025-06-06 12:34 ` [PULL 11/31] hpet: return errors from realize if properties are incorrect Paolo Bonzini
2025-06-06 12:34 ` [PULL 12/31] rust/hpet: " Paolo Bonzini
2025-06-06 12:34 ` [PULL 13/31] rust/hpet: Drop BqlCell wrapper for num_timers Paolo Bonzini
2025-06-06 12:34 ` [PULL 14/31] docs: update Rust module status Paolo Bonzini
2025-06-06 12:34 ` [PULL 15/31] rust: make TryFrom macro more resilient Paolo Bonzini
2025-06-06 12:34 ` Paolo Bonzini [this message]
2025-06-11  2:55   ` [PULL 16/31] i386/kvm: Prefault memory on page state change Xiaoyao Li
2025-06-11  6:12     ` Paolo Bonzini
2025-06-11  6:44       ` Xiaoyao Li
2025-06-06 12:34 ` [PULL 17/31] futex: Check value after qemu_futex_wait() Paolo Bonzini
2025-06-06 12:34 ` [PULL 18/31] futex: Support Windows Paolo Bonzini
2025-06-06 12:34 ` [PULL 19/31] qemu-thread: Replace __linux__ with CONFIG_LINUX Paolo Bonzini
2025-06-06 12:34 ` [PULL 20/31] qemu-thread: Avoid futex abstraction for non-Linux Paolo Bonzini
2025-06-06 12:34 ` [PULL 21/31] qemu-thread: Use futex for QemuEvent on Windows Paolo Bonzini
2025-06-06 12:34 ` [PULL 22/31] qemu-thread: Use futex if available for QemuLockCnt Paolo Bonzini
2025-06-06 12:34 ` [PULL 23/31] qemu-thread: Document QemuEvent Paolo Bonzini
2025-06-06 12:34 ` [PULL 24/31] migration: Replace QemuSemaphore with QemuEvent Paolo Bonzini
2025-06-06 12:34 ` [PULL 25/31] migration/colo: " Paolo Bonzini
2025-06-06 12:34 ` [PULL 26/31] migration/postcopy: " Paolo Bonzini
2025-06-06 12:34 ` [PULL 27/31] hw/display/apple-gfx: " Paolo Bonzini
2025-06-06 12:34 ` [PULL 28/31] target/i386: Detect flush-to-zero after rounding Paolo Bonzini
2025-06-06 12:34 ` [PULL 29/31] target/i386: Use correct type for get_float_exception_flags() values Paolo Bonzini
2025-06-06 12:34 ` [PULL 30/31] target/i386: Wire up MXCSR.DE and FPUS.DE correctly Paolo Bonzini
2025-06-06 12:34 ` [PULL 31/31] tests/tcg/x86_64/fma: add test for exact-denormal output Paolo Bonzini
2025-06-06 15:27 ` [PULL 00/31] Threading, Rust, i386 changes for 2025-06-06 Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250606123447.538131-17-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thomas.lendacky@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).