qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/4] Enable notify VM exit
@ 2022-09-29  7:03 Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 1/4] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

Notify VM exit is introduced to mitigate the potential DOS attach from
malicious VM. This series is the userspace part to enable this feature
through a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT. The detailed
info can be seen in Patch 4.

The corresponding KVM support can be found in linux 6.0-rc:
(2f4073e08f4c KVM: VMX: Enable Notify VM exit)

---
Change logs:
v7 -> v8
- Add triple_fault_pending field transmission on migration (Paolo)
- Change the notify-vmexit and notify-window to the accelerator property. Add it as
  a x86-specific property. (Paolo)
- Add a preparation patch to expose struct KVMState in order to add target-specific property.
- Define three option for notify-vmexit. Make it on by default. (Paolo)
- Raise a KVM internal error instead of triple fault if invalid context of guest VMCS detected.
- v7: https://lore.kernel.org/qemu-devel/20220923073333.23381-1-chenyi.qiang@intel.com/

v6 -> v7
- Add a warning message when exiting to userspace (Peter Xu)
- v6: https://lore.kernel.org/all/20220915092839.5518-1-chenyi.qiang@intel.com/

v5 -> v6
- Add some info related to the valid range of notify_window in patch 2. (Peter Xu)
- Add the doc in qemu-options.hx. (Peter Xu)
- v5: https://lore.kernel.org/qemu-devel/20220817020845.21855-1-chenyi.qiang@intel.com/

---

Chenyi Qiang (3):
  i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple
    fault
  kvm: expose struct KVMState
  i386: add notify VM exit support

Paolo Bonzini (1):
  kvm: allow target-specific accelerator properties

 accel/kvm/kvm-all.c      |  78 ++-----------------------
 include/sysemu/kvm.h     |   2 +
 include/sysemu/kvm_int.h |  75 ++++++++++++++++++++++++
 qapi/run-state.json      |  17 ++++++
 qemu-options.hx          |  11 ++++
 target/arm/kvm.c         |   4 ++
 target/i386/cpu.c        |   1 +
 target/i386/cpu.h        |   1 +
 target/i386/kvm/kvm.c    | 121 +++++++++++++++++++++++++++++++++++++++
 target/i386/machine.c    |  20 +++++++
 target/mips/kvm.c        |   4 ++
 target/ppc/kvm.c         |   4 ++
 target/riscv/kvm.c       |   4 ++
 target/s390x/kvm/kvm.c   |   4 ++
 14 files changed, 272 insertions(+), 74 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v8 1/4] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault
  2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
@ 2022-09-29  7:03 ` Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 2/4] kvm: allow target-specific accelerator properties Chenyi Qiang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.

A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 target/i386/cpu.c     |  1 +
 target/i386/cpu.h     |  1 +
 target/i386/kvm/kvm.c | 20 ++++++++++++++++++++
 target/i386/machine.c | 20 ++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1db1278a59..6e107466b3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6017,6 +6017,7 @@ static void x86_cpu_reset(DeviceState *dev)
     env->exception_has_payload = false;
     env->exception_payload = 0;
     env->nmi_injected = false;
+    env->triple_fault_pending = false;
 #if !defined(CONFIG_USER_ONLY)
     /* We hard-wire the BSP to the first CPU. */
     apic_designate_bsp(cpu->apic_state, s->cpu_index == 0);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b9..d4124973ce 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1739,6 +1739,7 @@ typedef struct CPUArchState {
     uint8_t has_error_code;
     uint8_t exception_has_payload;
     uint64_t exception_payload;
+    uint8_t triple_fault_pending;
     uint32_t ins_len;
     uint32_t sipi_vector;
     bool tsc_valid;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a1fd1f5379..3838827134 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -132,6 +132,7 @@ static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
 static int has_exception_payload;
+static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
@@ -2483,6 +2484,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    has_triple_fault_event = kvm_check_extension(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT);
+    if (has_triple_fault_event) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT, 0, true);
+        if (ret < 0) {
+            error_report("kvm: Failed to enable triple fault event cap: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
     ret = kvm_get_supported_msrs(s);
     if (ret < 0) {
         return ret;
@@ -4299,6 +4310,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
         }
     }
 
+    if (has_triple_fault_event) {
+        events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
+        events.triple_fault.pending = env->triple_fault_pending;
+    }
+
     return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
 }
 
@@ -4368,6 +4384,10 @@ static int kvm_get_vcpu_events(X86CPU *cpu)
         }
     }
 
+    if (events.flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT) {
+        env->triple_fault_pending = events.triple_fault.pending;
+    }
+
     env->sipi_vector = events.sipi_vector;
 
     return 0;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index cecd476e98..310b125235 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1562,6 +1562,25 @@ static const VMStateDescription vmstate_arch_lbr = {
     }
 };
 
+static bool triple_fault_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return env->triple_fault_pending;
+}
+
+static const VMStateDescription vmstate_triple_fault = {
+    .name = "cpu/triple_fault",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = triple_fault_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(env.triple_fault_pending, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 const VMStateDescription vmstate_x86_cpu = {
     .name = "cpu",
     .version_id = 12,
@@ -1706,6 +1725,7 @@ const VMStateDescription vmstate_x86_cpu = {
         &vmstate_amx_xtile,
 #endif
         &vmstate_arch_lbr,
+        &vmstate_triple_fault,
         NULL
     }
 };
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v8 2/4] kvm: allow target-specific accelerator properties
  2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 1/4] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
@ 2022-09-29  7:03 ` Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 3/4] kvm: expose struct KVMState Chenyi Qiang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: qemu-devel, kvm

From: Paolo Bonzini <pbonzini@redhat.com>

Several hypervisor capabilities in KVM are target-specific.  When exposed
to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they
should not be available for all targets.

Add a hook for targets to add their own properties to -accel kvm, for
now no such property is defined.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 accel/kvm/kvm-all.c    | 2 ++
 include/sysemu/kvm.h   | 2 ++
 target/arm/kvm.c       | 4 ++++
 target/i386/kvm/kvm.c  | 4 ++++
 target/mips/kvm.c      | 4 ++++
 target/ppc/kvm.c       | 4 ++++
 target/riscv/kvm.c     | 4 ++++
 target/s390x/kvm/kvm.c | 4 ++++
 8 files changed, 28 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5acab1767f..f90c5cb285 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc, void *data)
         NULL, NULL);
     object_class_property_set_description(oc, "dirty-ring-size",
         "Size of KVM dirty page ring buffer (default: 0, i.e. use bitmap)");
+
+    kvm_arch_accel_class_init(oc);
 }
 
 static const TypeInfo kvm_accel_type = {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index efd6dee818..50868ebf60 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type);
 
 extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
 
+void kvm_arch_accel_class_init(ObjectClass *oc);
+
 void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run);
 MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index e5c1bd50d2..d21603cf28 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1056,3 +1056,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3838827134..eab09833f9 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5472,3 +5472,7 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
         mask &= ~BIT_ULL(bit);
     }
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index caf70decd2..bcb8e06b2c 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -1294,3 +1294,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 466d0d2f4c..7c25348b7b 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2966,3 +2966,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 70b4cff06f..30f21453d6 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -532,3 +532,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 6a8dbadf7e..508c24cfec 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2581,3 +2581,7 @@ int kvm_s390_get_zpci_op(void)
 {
     return cap_zpci_op;
 }
+
+void kvm_arch_accel_class_init(ObjectClass *oc)
+{
+}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v8 3/4] kvm: expose struct KVMState
  2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 1/4] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 2/4] kvm: allow target-specific accelerator properties Chenyi Qiang
@ 2022-09-29  7:03 ` Chenyi Qiang
  2022-09-29  7:03 ` [PATCH v8 4/4] i386: add notify VM exit support Chenyi Qiang
  2022-09-29 17:28 ` [PATCH v8 0/4] Enable notify VM exit Paolo Bonzini
  4 siblings, 0 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

Expose struct KVMState out of kvm-all.c so that the field of struct
KVMState can be accessed when defining target-specific accelerator
properties.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 accel/kvm/kvm-all.c      | 74 ---------------------------------------
 include/sysemu/kvm_int.h | 75 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 74 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f90c5cb285..3624ed8447 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -77,86 +77,12 @@
     do { } while (0)
 #endif
 
-#define KVM_MSI_HASHTAB_SIZE    256
-
 struct KVMParkedVcpu {
     unsigned long vcpu_id;
     int kvm_fd;
     QLIST_ENTRY(KVMParkedVcpu) node;
 };
 
-enum KVMDirtyRingReaperState {
-    KVM_DIRTY_RING_REAPER_NONE = 0,
-    /* The reaper is sleeping */
-    KVM_DIRTY_RING_REAPER_WAIT,
-    /* The reaper is reaping for dirty pages */
-    KVM_DIRTY_RING_REAPER_REAPING,
-};
-
-/*
- * KVM reaper instance, responsible for collecting the KVM dirty bits
- * via the dirty ring.
- */
-struct KVMDirtyRingReaper {
-    /* The reaper thread */
-    QemuThread reaper_thr;
-    volatile uint64_t reaper_iteration; /* iteration number of reaper thr */
-    volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */
-};
-
-struct KVMState
-{
-    AccelState parent_obj;
-
-    int nr_slots;
-    int fd;
-    int vmfd;
-    int coalesced_mmio;
-    int coalesced_pio;
-    struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
-    bool coalesced_flush_in_progress;
-    int vcpu_events;
-    int robust_singlestep;
-    int debugregs;
-#ifdef KVM_CAP_SET_GUEST_DEBUG
-    QTAILQ_HEAD(, kvm_sw_breakpoint) kvm_sw_breakpoints;
-#endif
-    int max_nested_state_len;
-    int many_ioeventfds;
-    int intx_set_mask;
-    int kvm_shadow_mem;
-    bool kernel_irqchip_allowed;
-    bool kernel_irqchip_required;
-    OnOffAuto kernel_irqchip_split;
-    bool sync_mmu;
-    uint64_t manual_dirty_log_protect;
-    /* The man page (and posix) say ioctl numbers are signed int, but
-     * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
-     * unsigned, and treating them as signed here can break things */
-    unsigned irq_set_ioctl;
-    unsigned int sigmask_len;
-    GHashTable *gsimap;
-#ifdef KVM_CAP_IRQ_ROUTING
-    struct kvm_irq_routing *irq_routes;
-    int nr_allocated_irq_routes;
-    unsigned long *used_gsi_bitmap;
-    unsigned int gsi_count;
-    QTAILQ_HEAD(, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
-#endif
-    KVMMemoryListener memory_listener;
-    QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
-
-    /* For "info mtree -f" to tell if an MR is registered in KVM */
-    int nr_as;
-    struct KVMAs {
-        KVMMemoryListener *ml;
-        AddressSpace *as;
-    } *as;
-    uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
-    uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
-    struct KVMDirtyRingReaper reaper;
-};
-
 KVMState *kvm_state;
 bool kvm_kernel_irqchip;
 bool kvm_split_irqchip;
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 1f5487d9b7..07394744ad 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -36,6 +36,81 @@ typedef struct KVMMemoryListener {
     int as_id;
 } KVMMemoryListener;
 
+#define KVM_MSI_HASHTAB_SIZE    256
+
+enum KVMDirtyRingReaperState {
+    KVM_DIRTY_RING_REAPER_NONE = 0,
+    /* The reaper is sleeping */
+    KVM_DIRTY_RING_REAPER_WAIT,
+    /* The reaper is reaping for dirty pages */
+    KVM_DIRTY_RING_REAPER_REAPING,
+};
+
+/*
+ * KVM reaper instance, responsible for collecting the KVM dirty bits
+ * via the dirty ring.
+ */
+struct KVMDirtyRingReaper {
+    /* The reaper thread */
+    QemuThread reaper_thr;
+    volatile uint64_t reaper_iteration; /* iteration number of reaper thr */
+    volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */
+};
+struct KVMState
+{
+    AccelState parent_obj;
+
+    int nr_slots;
+    int fd;
+    int vmfd;
+    int coalesced_mmio;
+    int coalesced_pio;
+    struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
+    bool coalesced_flush_in_progress;
+    int vcpu_events;
+    int robust_singlestep;
+    int debugregs;
+#ifdef KVM_CAP_SET_GUEST_DEBUG
+    QTAILQ_HEAD(, kvm_sw_breakpoint) kvm_sw_breakpoints;
+#endif
+    int max_nested_state_len;
+    int many_ioeventfds;
+    int intx_set_mask;
+    int kvm_shadow_mem;
+    bool kernel_irqchip_allowed;
+    bool kernel_irqchip_required;
+    OnOffAuto kernel_irqchip_split;
+    bool sync_mmu;
+    uint64_t manual_dirty_log_protect;
+    /* The man page (and posix) say ioctl numbers are signed int, but
+     * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
+     * unsigned, and treating them as signed here can break things */
+    unsigned irq_set_ioctl;
+    unsigned int sigmask_len;
+    GHashTable *gsimap;
+#ifdef KVM_CAP_IRQ_ROUTING
+    struct kvm_irq_routing *irq_routes;
+    int nr_allocated_irq_routes;
+    unsigned long *used_gsi_bitmap;
+    unsigned int gsi_count;
+    QTAILQ_HEAD(, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
+#endif
+    KVMMemoryListener memory_listener;
+    QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
+
+    /* For "info mtree -f" to tell if an MR is registered in KVM */
+    int nr_as;
+    struct KVMAs {
+        KVMMemoryListener *ml;
+        AddressSpace *as;
+    } *as;
+    uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
+    uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+    struct KVMDirtyRingReaper reaper;
+    NotifyVmexitOption notify_vmexit;
+    uint32_t notify_window;
+};
+
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
                                   AddressSpace *as, int as_id, const char *name);
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v8 4/4] i386: add notify VM exit support
  2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
                   ` (2 preceding siblings ...)
  2022-09-29  7:03 ` [PATCH v8 3/4] kvm: expose struct KVMState Chenyi Qiang
@ 2022-09-29  7:03 ` Chenyi Qiang
  2022-09-29 17:28 ` [PATCH v8 0/4] Enable notify VM exit Paolo Bonzini
  4 siblings, 0 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-29  7:03 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: Chenyi Qiang, qemu-devel, kvm

There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).

A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
  Bit 63:32 - notify window specified in qemu command
  Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
              enable the feature.)

Users can configure the feature by a new (x86 only) accel property:
    qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n

The default option of notify-vmexit is run, which will enable the
capability and do nothing if the exit happens. The internal-error option
raises a KVM internal error if it happens. The disable option does not
enable the capability. The default value of notify-window is 0. It is valid
only when notify-vmexit is not disabled. The valid range of notify-window
is non-negative. It is even safe to set it to zero since there's an
internal hardware threshold to be added to ensure no false positive.

Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit
qualification (no cases are anticipated that would set this bit), which
means VM context is corrupted. It would be reflected in the flags of
KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM
internal error unconditionally.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 accel/kvm/kvm-all.c   |  2 +
 qapi/run-state.json   | 17 ++++++++
 qemu-options.hx       | 11 +++++
 target/i386/kvm/kvm.c | 97 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 3624ed8447..41ba9de3b8 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3636,6 +3636,8 @@ static void kvm_accel_instance_init(Object *obj)
     s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
     /* KVM dirty ring is by default off */
     s->kvm_dirty_ring_size = 0;
+    s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
+    s->notify_window = 0;
 }
 
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
diff --git a/qapi/run-state.json b/qapi/run-state.json
index 9273ea6516..49989d30e6 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -643,3 +643,20 @@
 { 'struct': 'MemoryFailureFlags',
   'data': { 'action-required': 'bool',
             'recursive': 'bool'} }
+
+##
+# @NotifyVmexitOption:
+#
+# An enumeration of the options specified when enabling notify VM exit
+#
+# @run: enable the feature, do nothing and continue if the notify VM exit happens.
+#
+# @internal-error: enable the feature, raise a internal error if the notify
+#                  VM exit happens.
+#
+# @disable: disable the feature.
+#
+# Since: 7.2
+##
+{ 'enum': 'NotifyVmexitOption',
+  'data': [ 'run', 'internal-error', 'disable' ] }
\ No newline at end of file
diff --git a/qemu-options.hx b/qemu-options.hx
index 913c71e38f..8f85004a7d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -191,6 +191,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "                split-wx=on|off (enable TCG split w^x mapping)\n"
     "                tb-size=n (TCG translation block cache size)\n"
     "                dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
+    "                notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
     "                thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL)
 SRST
 ``-accel name[,prop=value[,...]]``
@@ -242,6 +243,16 @@ SRST
         is disabled (dirty-ring-size=0).  When enabled, KVM will instead
         record dirty pages in a bitmap.
 
+    ``notify-vmexit=run|internal-error|disable,notify-window=n``
+        Enables or disables notify VM exit support on x86 host and specify
+        the corresponding notify window to trigger the VM exit if enabled.
+        ``run`` option enables the feature. It does nothing and continue
+        if the exit happens. ``internal-error`` option enables the feature.
+        It raises a internal error. ``disable`` option doesn't enable the feature.
+        This feature can mitigate the CPU stuck issue due to event windows don't
+        open up for a specified of time (i.e. notify-window).
+        Default: notify-vmexit=run,notify-window=0.
+
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index eab09833f9..55f521e4ec 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -15,6 +15,7 @@
 #include "qemu/osdep.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qapi/error.h"
+#include "qapi/visitor.h"
 #include <sys/ioctl.h>
 #include <sys/utsname.h>
 #include <sys/syscall.h>
@@ -2599,6 +2600,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    if (s->notify_vmexit != NOTIFY_VMEXIT_OPTION_DISABLE &&
+        kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) {
+            uint64_t notify_window_flags =
+                ((uint64_t)s->notify_window << 32) |
+                KVM_X86_NOTIFY_VMEXIT_ENABLED |
+                KVM_X86_NOTIFY_VMEXIT_USER;
+            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0,
+                                    notify_window_flags);
+            if (ret < 0) {
+                error_report("kvm: Failed to enable notify vmexit cap: %s",
+                             strerror(-ret));
+                return ret;
+            }
+    }
+
     return 0;
 }
 
@@ -5141,6 +5157,8 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
     X86CPU *cpu = X86_CPU(cs);
     uint64_t code;
     int ret;
+    bool ctx_invalid;
+    char str[256];
 
     switch (run->exit_reason) {
     case KVM_EXIT_HLT:
@@ -5196,6 +5214,21 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         /* already handled in kvm_arch_post_run */
         ret = 0;
         break;
+    case KVM_EXIT_NOTIFY:
+        KVMState *s = KVM_STATE(current_accel());
+        ctx_invalid = !!(run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID);
+        sprintf(str, "Encounter a notify exit with %svalid context in"
+                     " guest. There can be possible misbehaves in guest."
+                     " Please have a look.", ctx_invalid ? "in" : "");
+        if (ctx_invalid ||
+            s->notify_vmexit == NOTIFY_VMEXIT_OPTION_INTERNAL_ERROR) {
+            warn_report("KVM internal error: %s", str);
+            ret = -1;
+        } else {
+            warn_report_once("KVM: %s", str);
+            ret = 0;
+        }
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
@@ -5473,6 +5506,70 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
     }
 }
 
+static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    return s->notify_vmexit;
+}
+
+static void kvm_arch_set_notify_vmexit(Object *obj, int value, Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+
+    if (s->fd != -1) {
+        error_setg(errp, "Cannot set properties after the accelerator has been initialized");
+        return;
+    }
+
+    s->notify_vmexit = value;
+}
+
+static void kvm_arch_get_notify_window(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    uint32_t value = s->notify_window;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void kvm_arch_set_notify_window(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    Error *error = NULL;
+    uint32_t value;
+
+    if (s->fd != -1) {
+        error_setg(errp, "Cannot set properties after the accelerator has been initialized");
+        return;
+    }
+
+    visit_type_uint32(v, name, &value, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+
+    s->notify_window = value;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
+    object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
+                                   &NotifyVmexitOption_lookup,
+                                   kvm_arch_get_notify_vmexit,
+                                   kvm_arch_set_notify_vmexit);
+    object_class_property_set_description(oc, "notify-vmexit",
+                                          "Enable notify VM exit");
+
+    object_class_property_add(oc, "notify-window", "uint32",
+                              kvm_arch_get_notify_window,
+                              kvm_arch_set_notify_window,
+                              NULL, NULL);
+    object_class_property_set_description(oc, "notify-window",
+                                          "Clock cycles without an event window "
+                                          "after which a notification VM exit occurs");
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 0/4] Enable notify VM exit
  2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
                   ` (3 preceding siblings ...)
  2022-09-29  7:03 ` [PATCH v8 4/4] i386: add notify VM exit support Chenyi Qiang
@ 2022-09-29 17:28 ` Paolo Bonzini
  2022-09-30  0:42   ` Chenyi Qiang
  4 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2022-09-29 17:28 UTC (permalink / raw)
  To: Chenyi Qiang, Marcelo Tosatti, Richard Henderson, Eduardo Habkost,
	Peter Xu, Xiaoyao Li
  Cc: qemu-devel, kvm

On 9/29/22 09:03, Chenyi Qiang wrote:
> Notify VM exit is introduced to mitigate the potential DOS attach from
> malicious VM. This series is the userspace part to enable this feature
> through a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT. The detailed
> info can be seen in Patch 4.
> 
> The corresponding KVM support can be found in linux 6.0-rc:
> (2f4073e08f4c KVM: VMX: Enable Notify VM exit)

Thanks, I will queue this in my next pull request.

Paolo

> ---
> Change logs:
> v7 -> v8
> - Add triple_fault_pending field transmission on migration (Paolo)
> - Change the notify-vmexit and notify-window to the accelerator property. Add it as
>    a x86-specific property. (Paolo)
> - Add a preparation patch to expose struct KVMState in order to add target-specific property.
> - Define three option for notify-vmexit. Make it on by default. (Paolo)
> - Raise a KVM internal error instead of triple fault if invalid context of guest VMCS detected.
> - v7: https://lore.kernel.org/qemu-devel/20220923073333.23381-1-chenyi.qiang@intel.com/
> 
> v6 -> v7
> - Add a warning message when exiting to userspace (Peter Xu)
> - v6: https://lore.kernel.org/all/20220915092839.5518-1-chenyi.qiang@intel.com/
> 
> v5 -> v6
> - Add some info related to the valid range of notify_window in patch 2. (Peter Xu)
> - Add the doc in qemu-options.hx. (Peter Xu)
> - v5: https://lore.kernel.org/qemu-devel/20220817020845.21855-1-chenyi.qiang@intel.com/
> 
> ---
> 
> Chenyi Qiang (3):
>    i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple
>      fault
>    kvm: expose struct KVMState
>    i386: add notify VM exit support
> 
> Paolo Bonzini (1):
>    kvm: allow target-specific accelerator properties
> 
>   accel/kvm/kvm-all.c      |  78 ++-----------------------
>   include/sysemu/kvm.h     |   2 +
>   include/sysemu/kvm_int.h |  75 ++++++++++++++++++++++++
>   qapi/run-state.json      |  17 ++++++
>   qemu-options.hx          |  11 ++++
>   target/arm/kvm.c         |   4 ++
>   target/i386/cpu.c        |   1 +
>   target/i386/cpu.h        |   1 +
>   target/i386/kvm/kvm.c    | 121 +++++++++++++++++++++++++++++++++++++++
>   target/i386/machine.c    |  20 +++++++
>   target/mips/kvm.c        |   4 ++
>   target/ppc/kvm.c         |   4 ++
>   target/riscv/kvm.c       |   4 ++
>   target/s390x/kvm/kvm.c   |   4 ++
>   14 files changed, 272 insertions(+), 74 deletions(-)
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 0/4] Enable notify VM exit
  2022-09-29 17:28 ` [PATCH v8 0/4] Enable notify VM exit Paolo Bonzini
@ 2022-09-30  0:42   ` Chenyi Qiang
  0 siblings, 0 replies; 7+ messages in thread
From: Chenyi Qiang @ 2022-09-30  0:42 UTC (permalink / raw)
  To: Paolo Bonzini, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Peter Xu, Xiaoyao Li
  Cc: qemu-devel, kvm



On 9/30/2022 1:28 AM, Paolo Bonzini wrote:
> On 9/29/22 09:03, Chenyi Qiang wrote:
>> Notify VM exit is introduced to mitigate the potential DOS attach from
>> malicious VM. This series is the userspace part to enable this feature
>> through a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT. The detailed
>> info can be seen in Patch 4.
>>
>> The corresponding KVM support can be found in linux 6.0-rc:
>> (2f4073e08f4c KVM: VMX: Enable Notify VM exit)
> 
> Thanks, I will queue this in my next pull request.
> 
> Paolo
> 

Thanks Paolo!

Please take the resend version at 
https://lore.kernel.org/qemu-devel/20220929072014.20705-1-chenyi.qiang@intel.com/

There's a minor compile issue in this one.

Chenyi

>> ---
>> Change logs:
>> v7 -> v8
>> - Add triple_fault_pending field transmission on migration (Paolo)
>> - Change the notify-vmexit and notify-window to the accelerator 
>> property. Add it as
>>    a x86-specific property. (Paolo)
>> - Add a preparation patch to expose struct KVMState in order to add 
>> target-specific property.
>> - Define three option for notify-vmexit. Make it on by default. (Paolo)
>> - Raise a KVM internal error instead of triple fault if invalid 
>> context of guest VMCS detected.
>> - v7: 
>> https://lore.kernel.org/qemu-devel/20220923073333.23381-1-chenyi.qiang@intel.com/
>>
>> v6 -> v7
>> - Add a warning message when exiting to userspace (Peter Xu)
>> - v6: 
>> https://lore.kernel.org/all/20220915092839.5518-1-chenyi.qiang@intel.com/
>>
>> v5 -> v6
>> - Add some info related to the valid range of notify_window in patch 
>> 2. (Peter Xu)
>> - Add the doc in qemu-options.hx. (Peter Xu)
>> - v5: 
>> https://lore.kernel.org/qemu-devel/20220817020845.21855-1-chenyi.qiang@intel.com/
>>
>> ---
>>
>> Chenyi Qiang (3):
>>    i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple
>>      fault
>>    kvm: expose struct KVMState
>>    i386: add notify VM exit support
>>
>> Paolo Bonzini (1):
>>    kvm: allow target-specific accelerator properties
>>
>>   accel/kvm/kvm-all.c      |  78 ++-----------------------
>>   include/sysemu/kvm.h     |   2 +
>>   include/sysemu/kvm_int.h |  75 ++++++++++++++++++++++++
>>   qapi/run-state.json      |  17 ++++++
>>   qemu-options.hx          |  11 ++++
>>   target/arm/kvm.c         |   4 ++
>>   target/i386/cpu.c        |   1 +
>>   target/i386/cpu.h        |   1 +
>>   target/i386/kvm/kvm.c    | 121 +++++++++++++++++++++++++++++++++++++++
>>   target/i386/machine.c    |  20 +++++++
>>   target/mips/kvm.c        |   4 ++
>>   target/ppc/kvm.c         |   4 ++
>>   target/riscv/kvm.c       |   4 ++
>>   target/s390x/kvm/kvm.c   |   4 ++
>>   14 files changed, 272 insertions(+), 74 deletions(-)
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-30  0:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-29  7:03 [PATCH v8 0/4] Enable notify VM exit Chenyi Qiang
2022-09-29  7:03 ` [PATCH v8 1/4] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault Chenyi Qiang
2022-09-29  7:03 ` [PATCH v8 2/4] kvm: allow target-specific accelerator properties Chenyi Qiang
2022-09-29  7:03 ` [PATCH v8 3/4] kvm: expose struct KVMState Chenyi Qiang
2022-09-29  7:03 ` [PATCH v8 4/4] i386: add notify VM exit support Chenyi Qiang
2022-09-29 17:28 ` [PATCH v8 0/4] Enable notify VM exit Paolo Bonzini
2022-09-30  0:42   ` Chenyi Qiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).