* [PATCH v3] i386/kvm: Provide knob to disable hypercall patching quirk
@ 2025-08-01 13:12 Mathias Krause
2025-08-04 3:32 ` Xiaoyao Li
0 siblings, 1 reply; 3+ messages in thread
From: Mathias Krause @ 2025-08-01 13:12 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
Cc: Marcelo Tosatti, Xiaoyao Li, kvm, Mathias Krause, Oliver Upton,
Sean Christopherson
KVM has a weird behaviour when a guest executes VMCALL on an AMD system
or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode
exception (#UD) as they are just the wrong instruction for the CPU
given. But instead of forwarding the exception to the guest, KVM tries
to patch the guest instruction to match the host's actual hypercall
instruction. That is doomed to fail for regular operating systems, as
read-only code is rather the standard these days. But, instead of
letting go the patching attempt and falling back to #UD injection, KVM
propagates its failure and injects the page fault instead.
That's wrong on multiple levels. Not only isn't that a valid exception
to be generated by these instructions, confusing attempts to handle
them. It also destroys guest state by doing so, namely the value of CR2.
Sean attempted to fix that in KVM[1] but the patch was never applied.
Later, Oliver added a quirk bit in [2] so the behaviour can, at least,
conceptually be disabled. Paolo even called out to add this very
functionality to disable the quirk in QEMU[3]. So lets just do it.
Add a new property 'hypercall-patching=on|off' to the KVM accelerator
but keep the default behaviour as-is as there are, unfortunately,
systems out there relying on the patching, e.g. KUT[4,5].
For regular operating systems, however, the patching wouldn't be needed,
nor work at all. If it would, these systrems would be vulnerable to
memory corruption attacks, freely overwriting kernel code as they
please.
[1] https://lore.kernel.org/kvm/20211210222903.3417968-1-seanjc@google.com/
[2] https://lore.kernel.org/kvm/20220316005538.2282772-2-oupton@google.com/
[3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665-c00e4fe9cb9c@redhat.com/
[4] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/f045ea5627a3/x86/apic.c#L644
[5] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/f045ea5627a3/x86/vmexit.c#L36
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
Xiaoyao, I left out your Tested-by and Reviewed-by as I changed the code
(slightly) and it didn't felt right to pick these up. However, as only
the default value changed, the functionality would be the same if you
tested both cases explicitly (-accel kvm,hypercall-patching={on,off}).
v3:
- switch default to 'on' to not change the default behaviour
- reference KUT tests relying on hypercall patching
v2:
- rename hypercall_patching_enabled to hypercall_patching (Xiaoyao Li)
- make use of error_setg*() (Xiaoyao Li)
accel/kvm/kvm-all.c | 1 +
include/system/kvm_int.h | 1 +
qemu-options.hx | 10 +++++++++
target/i386/kvm/kvm.c | 45 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 57 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 890d5ea9f865..a68f779b6c1c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3997,6 +3997,7 @@ static void kvm_accel_instance_init(Object *obj)
s->kvm_dirty_ring_size = 0;
s->kvm_dirty_ring_with_bitmap = false;
s->kvm_eager_split_size = 0;
+ s->hypercall_patching = true;
s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
s->notify_window = 0;
s->xen_version = 0;
diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
index 9247493b0299..ec891ca8e302 100644
--- a/include/system/kvm_int.h
+++ b/include/system/kvm_int.h
@@ -160,6 +160,7 @@ struct KVMState
uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */
struct KVMDirtyRingReaper reaper;
struct KVMMsrEnergy msr_energy;
+ bool hypercall_patching;
NotifyVmexitOption notify_vmexit;
uint32_t notify_window;
uint32_t xen_version;
diff --git a/qemu-options.hx b/qemu-options.hx
index ab23f14d2178..98af1a91e6e6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -236,6 +236,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
" dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
" eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n"
" notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
+ " hypercall-patching=on|off (disable KVM's VMCALL/VMMCALL hypercall patching quirk, x86 only)\n"
" thread=single|multi (enable multi-threaded TCG)\n"
" device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL)
SRST
@@ -318,6 +319,15 @@ SRST
open up for a specified of time (i.e. notify-window).
Default: notify-vmexit=run,notify-window=0.
+ ``hypercall-patching=on|off``
+ KVM tries to recover from the wrong hypercall instruction being used by
+ a guest by attempting to rewrite it to the one supported natively by
+ the host CPU (VMCALL on Intel, VMMCALL for AMD systems). However, this
+ patching may fail if the guest memory is write protected, leading to a
+ page fault getting propagated to the guest instead of an illegal
+ instruction exception. As this may confuse guests, this option allows
+ disabling it (x86 only, enabled by default).
+
``device=path``
Sets the path to the KVM device node. Defaults to ``/dev/kvm``. This
option can be used to pass the KVM device to use via a file descriptor
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 369626f8c8d7..a841d53c240f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3228,6 +3228,26 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
return 0;
}
+static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp)
+{
+ int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2);
+ int ret = -1;
+
+ if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) {
+ ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0,
+ KVM_X86_QUIRK_FIX_HYPERCALL_INSN);
+ if (ret) {
+ error_setg_errno(errp, -ret, "kvm: failed to disable "
+ "hypercall patching quirk: %s",
+ strerror(-ret));
+ }
+ } else {
+ error_setg(errp, "kvm: disabling hypercall patching not supported");
+ }
+
+ return ret;
+}
+
int kvm_arch_init(MachineState *ms, KVMState *s)
{
int ret;
@@ -3367,6 +3387,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
}
+ if (!s->hypercall_patching) {
+ if (kvm_vm_disable_hypercall_patching(s, &local_err)) {
+ error_report_err(local_err);
+ }
+ }
+
return 0;
}
@@ -6478,6 +6504,19 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
}
}
+static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp)
+{
+ KVMState *s = KVM_STATE(obj);
+ return s->hypercall_patching;
+}
+
+static void kvm_arch_set_hypercall_patching(Object *obj, bool value,
+ Error **errp)
+{
+ KVMState *s = KVM_STATE(obj);
+ s->hypercall_patching = value;
+}
+
static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp)
{
KVMState *s = KVM_STATE(obj);
@@ -6611,6 +6650,12 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v,
void kvm_arch_accel_class_init(ObjectClass *oc)
{
+ object_class_property_add_bool(oc, "hypercall-patching",
+ kvm_arch_get_hypercall_patching,
+ kvm_arch_set_hypercall_patching);
+ object_class_property_set_description(oc, "hypercall-patching",
+ "Disable hypercall patching quirk");
+
object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
&NotifyVmexitOption_lookup,
kvm_arch_get_notify_vmexit,
--
2.30.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3] i386/kvm: Provide knob to disable hypercall patching quirk
2025-08-01 13:12 [PATCH v3] i386/kvm: Provide knob to disable hypercall patching quirk Mathias Krause
@ 2025-08-04 3:32 ` Xiaoyao Li
2025-08-04 9:16 ` Mathias Krause
0 siblings, 1 reply; 3+ messages in thread
From: Xiaoyao Li @ 2025-08-04 3:32 UTC (permalink / raw)
To: Mathias Krause, Paolo Bonzini, qemu-devel
Cc: Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson
On 8/1/2025 9:12 PM, Mathias Krause wrote:
> KVM has a weird behaviour when a guest executes VMCALL on an AMD system
> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode
> exception (#UD) as they are just the wrong instruction for the CPU
> given. But instead of forwarding the exception to the guest, KVM tries
> to patch the guest instruction to match the host's actual hypercall
> instruction. That is doomed to fail for regular operating systems, as
> read-only code is rather the standard these days. But, instead of
> letting go the patching attempt and falling back to #UD injection, KVM
> propagates its failure and injects the page fault instead.
>
> That's wrong on multiple levels. Not only isn't that a valid exception
> to be generated by these instructions, confusing attempts to handle
> them. It also destroys guest state by doing so, namely the value of CR2.
>
> Sean attempted to fix that in KVM[1] but the patch was never applied.
>
> Later, Oliver added a quirk bit in [2] so the behaviour can, at least,
> conceptually be disabled. Paolo even called out to add this very
> functionality to disable the quirk in QEMU[3]. So lets just do it.
>
> Add a new property 'hypercall-patching=on|off' to the KVM accelerator
> but keep the default behaviour as-is as there are, unfortunately,
> systems out there relying on the patching, e.g. KUT[4,5].
>
> For regular operating systems, however, the patching wouldn't be needed,
> nor work at all. If it would, these systrems would be vulnerable to
> memory corruption attacks, freely overwriting kernel code as they
> please.
For non-coco VMs, the systems are surely vulnerable to memory corruption
attacks that the host VMM is free to modify the guest memory. It's
irrelevant to whether hypercall patching is needed or works.
> [1] https://lore.kernel.org/kvm/20211210222903.3417968-1-seanjc@google.com/
> [2] https://lore.kernel.org/kvm/20220316005538.2282772-2-oupton@google.com/
> [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665-c00e4fe9cb9c@redhat.com/
> [4] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/f045ea5627a3/x86/apic.c#L644
> [5] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/blob/f045ea5627a3/x86/vmexit.c#L36
>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> ---
> Xiaoyao, I left out your Tested-by and Reviewed-by as I changed the code
> (slightly) and it didn't felt right to pick these up. However, as only
> the default value changed, the functionality would be the same if you
> tested both cases explicitly (-accel kvm,hypercall-patching={on,off}).
No problem, I just re-tested it.
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> v3:
> - switch default to 'on' to not change the default behaviour
> - reference KUT tests relying on hypercall patching
>
> v2:
> - rename hypercall_patching_enabled to hypercall_patching (Xiaoyao Li)
> - make use of error_setg*() (Xiaoyao Li)
>
> accel/kvm/kvm-all.c | 1 +
> include/system/kvm_int.h | 1 +
> qemu-options.hx | 10 +++++++++
> target/i386/kvm/kvm.c | 45 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 57 insertions(+)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 890d5ea9f865..a68f779b6c1c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3997,6 +3997,7 @@ static void kvm_accel_instance_init(Object *obj)
> s->kvm_dirty_ring_size = 0;
> s->kvm_dirty_ring_with_bitmap = false;
> s->kvm_eager_split_size = 0;
> + s->hypercall_patching = true;
> s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
> s->notify_window = 0;
> s->xen_version = 0;
> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
> index 9247493b0299..ec891ca8e302 100644
> --- a/include/system/kvm_int.h
> +++ b/include/system/kvm_int.h
> @@ -160,6 +160,7 @@ struct KVMState
> uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */
> struct KVMDirtyRingReaper reaper;
> struct KVMMsrEnergy msr_energy;
> + bool hypercall_patching;
> NotifyVmexitOption notify_vmexit;
> uint32_t notify_window;
> uint32_t xen_version;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index ab23f14d2178..98af1a91e6e6 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -236,6 +236,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
> " dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
> " eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n"
> " notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
> + " hypercall-patching=on|off (disable KVM's VMCALL/VMMCALL hypercall patching quirk, x86 only)\n"
I would like to say "(configure KVM's VMCALL/VMCALL hypercall patching
quirk, x86 only)" instead of "disable"
> " thread=single|multi (enable multi-threaded TCG)\n"
> " device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL)
> SRST
> @@ -318,6 +319,15 @@ SRST
> open up for a specified of time (i.e. notify-window).
> Default: notify-vmexit=run,notify-window=0.
>
> + ``hypercall-patching=on|off``
> + KVM tries to recover from the wrong hypercall instruction being used by
> + a guest by attempting to rewrite it to the one supported natively by
> + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). However, this
> + patching may fail if the guest memory is write protected, leading to a
> + page fault getting propagated to the guest instead of an illegal
> + instruction exception. As this may confuse guests, this option allows
> + disabling it (x86 only, enabled by default).
> +
> ``device=path``
> Sets the path to the KVM device node. Defaults to ``/dev/kvm``. This
> option can be used to pass the KVM device to use via a file descriptor
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8d7..a841d53c240f 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3228,6 +3228,26 @@ static int kvm_vm_enable_energy_msrs(KVMState *s)
> return 0;
> }
>
> +static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp)
> +{
> + int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2);
> + int ret = -1;
> +
> + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) {
> + ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0,
> + KVM_X86_QUIRK_FIX_HYPERCALL_INSN);
> + if (ret) {
> + error_setg_errno(errp, -ret, "kvm: failed to disable "
> + "hypercall patching quirk: %s",
> + strerror(-ret));
> + }
> + } else {
> + error_setg(errp, "kvm: disabling hypercall patching not supported");
> + }
> +
> + return ret;
> +}
> +
> int kvm_arch_init(MachineState *ms, KVMState *s)
> {
> int ret;
> @@ -3367,6 +3387,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> }
> }
>
> + if (!s->hypercall_patching) {
> + if (kvm_vm_disable_hypercall_patching(s, &local_err)) {
> + error_report_err(local_err);
> + }
> + }
> +
> return 0;
> }
>
> @@ -6478,6 +6504,19 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
> }
> }
>
> +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp)
> +{
> + KVMState *s = KVM_STATE(obj);
> + return s->hypercall_patching;
> +}
> +
> +static void kvm_arch_set_hypercall_patching(Object *obj, bool value,
> + Error **errp)
> +{
> + KVMState *s = KVM_STATE(obj);
> + s->hypercall_patching = value;
> +}
> +
> static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp)
> {
> KVMState *s = KVM_STATE(obj);
> @@ -6611,6 +6650,12 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v,
>
> void kvm_arch_accel_class_init(ObjectClass *oc)
> {
> + object_class_property_add_bool(oc, "hypercall-patching",
> + kvm_arch_get_hypercall_patching,
> + kvm_arch_set_hypercall_patching);
> + object_class_property_set_description(oc, "hypercall-patching",
> + "Disable hypercall patching quirk");
Ditto, Could we use "Configure hypercall patching quirk"? It's not only
to disable it.
> object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
> &NotifyVmexitOption_lookup,
> kvm_arch_get_notify_vmexit,
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] i386/kvm: Provide knob to disable hypercall patching quirk
2025-08-04 3:32 ` Xiaoyao Li
@ 2025-08-04 9:16 ` Mathias Krause
0 siblings, 0 replies; 3+ messages in thread
From: Mathias Krause @ 2025-08-04 9:16 UTC (permalink / raw)
To: Xiaoyao Li, Paolo Bonzini, qemu-devel
Cc: Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson
On 04.08.25 05:32, Xiaoyao Li wrote:
> On 8/1/2025 9:12 PM, Mathias Krause wrote:
>> [...]
>>
>> For regular operating systems, however, the patching wouldn't be needed,
>> nor work at all. If it would, these systrems would be vulnerable to
>> memory corruption attacks, freely overwriting kernel code as they
>> please.
>
> For non-coco VMs, the systems are surely vulnerable to memory corruption
> attacks that the host VMM is free to modify the guest memory. It's
> irrelevant to whether hypercall patching is needed or works.
Sure, a VMM could mess with the guest's memory as it pleases. However, I
meant possible attacks from *within* the guest, as in allowing code
modifications to happen by having W+X mappings, allowing possibly
malicious modifications of such.
>> [...]
>> ---
>> Xiaoyao, I left out your Tested-by and Reviewed-by as I changed the code
>> (slightly) and it didn't felt right to pick these up. However, as only
>> the default value changed, the functionality would be the same if you
>> tested both cases explicitly (-accel kvm,hypercall-patching={on,off}).
>
> No problem, I just re-tested it.
>
> Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Thanks!
>> [...]
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index ab23f14d2178..98af1a91e6e6 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -236,6 +236,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>> " dirty-ring-size=n (KVM dirty ring GFN count,
>> default 0)\n"
>> " eager-split-size=n (KVM Eager Page Split chunk
>> size, default 0, disabled. ARM only)\n"
>> " notify-vmexit=run|internal-error|
>> disable,notify-window=n (enable notify VM exit and set notify window,
>> x86 only)\n"
>> + " hypercall-patching=on|off (disable KVM's VMCALL/
>> VMMCALL hypercall patching quirk, x86 only)\n"
>
> I would like to say "(configure KVM's VMCALL/VMCALL hypercall patching
> quirk, x86 only)" instead of "disable"
That would be technically correct. However, as this quirk is enabled by
default in KVM and QEMU, the only sensible configuration toggle is to
disable it. That's why I stated it this way. But I can rephrase it, if
you prefer it this way.
>> [...]
>> @@ -6611,6 +6650,12 @@ static void
>> kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v,
>> void kvm_arch_accel_class_init(ObjectClass *oc)
>> {
>> + object_class_property_add_bool(oc, "hypercall-patching",
>> + kvm_arch_get_hypercall_patching,
>> + kvm_arch_set_hypercall_patching);
>> + object_class_property_set_description(oc, "hypercall-patching",
>> + "Disable hypercall patching
>> quirk");
>
> Ditto, Could we use "Configure hypercall patching quirk"? It's not only
> to disable it.
Or just "Hypercall patching quirk", as the bool value already reflects
its state.
Thanks,
Mathias
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-04 9:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01 13:12 [PATCH v3] i386/kvm: Provide knob to disable hypercall patching quirk Mathias Krause
2025-08-04 3:32 ` Xiaoyao Li
2025-08-04 9:16 ` Mathias Krause
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).