* [PATCH] i386/kvm: Disable hypercall patching quirk by default @ 2025-06-19 19:42 Mathias Krause 2025-07-21 10:22 ` Mathias Krause ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Mathias Krause @ 2025-06-19 19:42 UTC (permalink / raw) To: qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Mathias Krause, Oliver Upton, Sean Christopherson KVM has a weird behaviour when a guest executes VMCALL on an AMD system or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode exception (#UD) as they are just the wrong instruction for the CPU given. But instead of forwarding the exception to the guest, KVM tries to patch the guest instruction to match the host's actual hypercall instruction. That is doomed to fail as read-only code is rather the standard these days. But, instead of letting go the patching attempt and falling back to #UD injection, KVM injects the page fault instead. That's wrong on multiple levels. Not only isn't that a valid exception to be generated by these instructions, confusing attempts to handle them. It also destroys guest state by doing so, namely the value of CR2. Sean attempted to fix that in KVM[1] but the patch was never applied. Later, Oliver added a quirk bit in [2] so the behaviour can, at least, conceptually be disabled. Paolo even called out to add this very functionality to disable the quirk in QEMU[3]. So lets just do it. A new property 'hypercall-patching=on|off' is added, for the very unlikely case that there are setups that really need the patching. However, these would be vulnerable to memory corruption attacks freely overwriting code as they please. So, my guess is, there are exactly 0 systems out there requiring this quirk. [1] https://lore.kernel.org/kvm/20211210222903.3417968-1-seanjc@google.com/ [2] https://lore.kernel.org/kvm/20220316005538.2282772-2-oupton@google.com/ [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665-c00e4fe9cb9c@redhat.com/ Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> --- include/system/kvm_int.h | 1 + qemu-options.hx | 10 ++++++++++ target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h index 756a3c0a250e..fd7129824429 100644 --- a/include/system/kvm_int.h +++ b/include/system/kvm_int.h @@ -159,6 +159,7 @@ struct KVMState uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */ struct KVMDirtyRingReaper reaper; struct KVMMsrEnergy msr_energy; + bool hypercall_patching_enabled; NotifyVmexitOption notify_vmexit; uint32_t notify_window; uint32_t xen_version; diff --git a/qemu-options.hx b/qemu-options.hx index 1f862b19a676..c2e232649c19 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, " dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" " eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" " notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" + " hypercall-patching=on|off (enable KVM's VMCALL/VMMCALL hypercall patching quirk, x86 only)\n" " thread=single|multi (enable multi-threaded TCG)\n" " device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL) SRST @@ -313,6 +314,15 @@ SRST open up for a specified of time (i.e. notify-window). Default: notify-vmexit=run,notify-window=0. + ``hypercall-patching=on|off`` + KVM tries to recover from the wrong hypercall instruction being used by + a guest by attempting to rewrite it to the one supported natively by + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). However, this + patching may fail if the guest memory is write protected, leading to a + page fault getting propagated to the guest instead of an illegal + instruction exception. As this may confuse guests, it gets disabled by + default (x86 only). + ``device=path`` Sets the path to the KVM device node. Defaults to ``/dev/kvm``. This option can be used to pass the KVM device to use via a file descriptor diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 56a6b9b6381a..6f5f3b95e553 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) return 0; } +static int kvm_vm_disable_hypercall_patching(KVMState *s) +{ + int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); + + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); + } + + warn_report("kvm: disabling hypercall patching not supported"); + return 0; +} + int kvm_arch_init(MachineState *ms, KVMState *s) { int ret; @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } + if (s->hypercall_patching_enabled == false) { + if (kvm_vm_disable_hypercall_patching(s)) { + warn_report("kvm: failed to disable hypercall patching quirk"); + } + } + return 0; } @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask) } } +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) +{ + KVMState *s = KVM_STATE(obj); + return s->hypercall_patching_enabled; +} + +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, + Error **errp) +{ + KVMState *s = KVM_STATE(obj); + s->hypercall_patching_enabled = value; +} + static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) { KVMState *s = KVM_STATE(obj); @@ -6589,6 +6621,12 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, void kvm_arch_accel_class_init(ObjectClass *oc) { + object_class_property_add_bool(oc, "hypercall-patching", + kvm_arch_get_hypercall_patching, + kvm_arch_set_hypercall_patching); + object_class_property_set_description(oc, "hypercall-patching", + "Enable hypercall patching quirk"); + object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption", &NotifyVmexitOption_lookup, kvm_arch_get_notify_vmexit, -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-06-19 19:42 [PATCH] i386/kvm: Disable hypercall patching quirk by default Mathias Krause @ 2025-07-21 10:22 ` Mathias Krause 2025-07-22 3:45 ` Xiaoyao Li 2025-07-23 9:26 ` David Woodhouse 2 siblings, 0 replies; 14+ messages in thread From: Mathias Krause @ 2025-07-21 10:22 UTC (permalink / raw) To: Paolo Bonzini, Marcelo Tosatti Cc: Oliver Upton, Sean Christopherson, qemu-devel, kvm On 19.06.25 21:42, Mathias Krause wrote: > KVM has a weird behaviour when a guest executes VMCALL on an AMD system > or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode > exception (#UD) as they are just the wrong instruction for the CPU > given. But instead of forwarding the exception to the guest, KVM tries > to patch the guest instruction to match the host's actual hypercall > instruction. That is doomed to fail as read-only code is rather the > standard these days. But, instead of letting go the patching attempt and > falling back to #UD injection, KVM injects the page fault instead. > > That's wrong on multiple levels. Not only isn't that a valid exception > to be generated by these instructions, confusing attempts to handle > them. It also destroys guest state by doing so, namely the value of CR2. > > Sean attempted to fix that in KVM[1] but the patch was never applied. > > Later, Oliver added a quirk bit in [2] so the behaviour can, at least, > conceptually be disabled. Paolo even called out to add this very > functionality to disable the quirk in QEMU[3]. So lets just do it. > > A new property 'hypercall-patching=on|off' is added, for the very > unlikely case that there are setups that really need the patching. > However, these would be vulnerable to memory corruption attacks freely > overwriting code as they please. So, my guess is, there are exactly 0 > systems out there requiring this quirk. > > [1] https://lore.kernel.org/kvm/20211210222903.3417968-1-seanjc@google.com/ > [2] https://lore.kernel.org/kvm/20220316005538.2282772-2-oupton@google.com/ > [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665-c00e4fe9cb9c@redhat.com/ > > Cc: Oliver Upton <oliver.upton@linux.dev> > Cc: Sean Christopherson <seanjc@google.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Mathias Krause <minipli@grsecurity.net> > --- > include/system/kvm_int.h | 1 + > qemu-options.hx | 10 ++++++++++ > target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 49 insertions(+) > > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > index 756a3c0a250e..fd7129824429 100644 > --- a/include/system/kvm_int.h > +++ b/include/system/kvm_int.h > @@ -159,6 +159,7 @@ struct KVMState > uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */ > struct KVMDirtyRingReaper reaper; > struct KVMMsrEnergy msr_energy; > + bool hypercall_patching_enabled; > NotifyVmexitOption notify_vmexit; > uint32_t notify_window; > uint32_t xen_version; > diff --git a/qemu-options.hx b/qemu-options.hx > index 1f862b19a676..c2e232649c19 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, > " dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" > " eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" > " notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" > + " hypercall-patching=on|off (enable KVM's VMCALL/VMMCALL hypercall patching quirk, x86 only)\n" > " thread=single|multi (enable multi-threaded TCG)\n" > " device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL) > SRST > @@ -313,6 +314,15 @@ SRST > open up for a specified of time (i.e. notify-window). > Default: notify-vmexit=run,notify-window=0. > > + ``hypercall-patching=on|off`` > + KVM tries to recover from the wrong hypercall instruction being used by > + a guest by attempting to rewrite it to the one supported natively by > + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). However, this > + patching may fail if the guest memory is write protected, leading to a > + page fault getting propagated to the guest instead of an illegal > + instruction exception. As this may confuse guests, it gets disabled by > + default (x86 only). > + > ``device=path`` > Sets the path to the KVM device node. Defaults to ``/dev/kvm``. This > option can be used to pass the KVM device to use via a file descriptor > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 56a6b9b6381a..6f5f3b95e553 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) > return 0; > } > > +static int kvm_vm_disable_hypercall_patching(KVMState *s) > +{ > + int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); > + > + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { > + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + } > + > + warn_report("kvm: disabling hypercall patching not supported"); > + return 0; > +} > + > int kvm_arch_init(MachineState *ms, KVMState *s) > { > int ret; > @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > } > > + if (s->hypercall_patching_enabled == false) { > + if (kvm_vm_disable_hypercall_patching(s)) { > + warn_report("kvm: failed to disable hypercall patching quirk"); > + } > + } > + > return 0; > } > > @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask) > } > } > > +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) > +{ > + KVMState *s = KVM_STATE(obj); > + return s->hypercall_patching_enabled; > +} > + > +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, > + Error **errp) > +{ > + KVMState *s = KVM_STATE(obj); > + s->hypercall_patching_enabled = value; > +} > + > static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) > { > KVMState *s = KVM_STATE(obj); > @@ -6589,6 +6621,12 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, > > void kvm_arch_accel_class_init(ObjectClass *oc) > { > + object_class_property_add_bool(oc, "hypercall-patching", > + kvm_arch_get_hypercall_patching, > + kvm_arch_set_hypercall_patching); > + object_class_property_set_description(oc, "hypercall-patching", > + "Enable hypercall patching quirk"); > + > object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption", > &NotifyVmexitOption_lookup, > kvm_arch_get_notify_vmexit, Ping! Paolo, can we get this weird and unexpected behaviour of KVM get disabled by default, please? Thanks, Mathias ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-06-19 19:42 [PATCH] i386/kvm: Disable hypercall patching quirk by default Mathias Krause 2025-07-21 10:22 ` Mathias Krause @ 2025-07-22 3:45 ` Xiaoyao Li 2025-07-22 9:21 ` Mathias Krause 2025-07-23 9:26 ` David Woodhouse 2 siblings, 1 reply; 14+ messages in thread From: Xiaoyao Li @ 2025-07-22 3:45 UTC (permalink / raw) To: Mathias Krause, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 6/20/2025 3:42 AM, Mathias Krause wrote: > KVM has a weird behaviour when a guest executes VMCALL on an AMD system > or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode > exception (#UD) as they are just the wrong instruction for the CPU > given. But instead of forwarding the exception to the guest, KVM tries > to patch the guest instruction to match the host's actual hypercall > instruction. That is doomed to fail as read-only code is rather the > standard these days. But, instead of letting go the patching attempt and > falling back to #UD injection, KVM injects the page fault instead. > > That's wrong on multiple levels. Not only isn't that a valid exception > to be generated by these instructions, confusing attempts to handle > them. It also destroys guest state by doing so, namely the value of CR2. > > Sean attempted to fix that in KVM[1] but the patch was never applied. > > Later, Oliver added a quirk bit in [2] so the behaviour can, at least, > conceptually be disabled. Paolo even called out to add this very > functionality to disable the quirk in QEMU[3]. So lets just do it. > > A new property 'hypercall-patching=on|off' is added, for the very > unlikely case that there are setups that really need the patching. > However, these would be vulnerable to memory corruption attacks freely > overwriting code as they please. So, my guess is, there are exactly 0 > systems out there requiring this quirk. The default behavior is patching the hypercall for many years. If you desire to change the default behavior, please at least keep it unchanged for old machine version. i.e., introduce compat_property, which sets KVMState->hypercall_patching_enabled to true. > [1] https://lore.kernel.org/kvm/20211210222903.3417968-1-seanjc@google.com/ > [2] https://lore.kernel.org/kvm/20220316005538.2282772-2-oupton@google.com/ > [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665-c00e4fe9cb9c@redhat.com/ > > Cc: Oliver Upton <oliver.upton@linux.dev> > Cc: Sean Christopherson <seanjc@google.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Mathias Krause <minipli@grsecurity.net> > --- > include/system/kvm_int.h | 1 + > qemu-options.hx | 10 ++++++++++ > target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 49 insertions(+) > > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h > index 756a3c0a250e..fd7129824429 100644 > --- a/include/system/kvm_int.h > +++ b/include/system/kvm_int.h > @@ -159,6 +159,7 @@ struct KVMState > uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk size */ > struct KVMDirtyRingReaper reaper; > struct KVMMsrEnergy msr_energy; > + bool hypercall_patching_enabled; IMHO, we can just name it "hypercall_patching". Since it's a boolean type, true means enabled and false means disabled. > NotifyVmexitOption notify_vmexit; > uint32_t notify_window; > uint32_t xen_version; > diff --git a/qemu-options.hx b/qemu-options.hx > index 1f862b19a676..c2e232649c19 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, > " dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n" > " eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n" > " notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n" > + " hypercall-patching=on|off (enable KVM's VMCALL/VMMCALL hypercall patching quirk, x86 only)\n" > " thread=single|multi (enable multi-threaded TCG)\n" > " device=path (KVM device path, default /dev/kvm)\n", QEMU_ARCH_ALL) > SRST > @@ -313,6 +314,15 @@ SRST > open up for a specified of time (i.e. notify-window). > Default: notify-vmexit=run,notify-window=0. > > + ``hypercall-patching=on|off`` > + KVM tries to recover from the wrong hypercall instruction being used by > + a guest by attempting to rewrite it to the one supported natively by > + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). However, this > + patching may fail if the guest memory is write protected, leading to a > + page fault getting propagated to the guest instead of an illegal > + instruction exception. As this may confuse guests, it gets disabled by > + default (x86 only). > + > ``device=path`` > Sets the path to the KVM device node. Defaults to ``/dev/kvm``. This > option can be used to pass the KVM device to use via a file descriptor > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 56a6b9b6381a..6f5f3b95e553 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) > return 0; > } > > +static int kvm_vm_disable_hypercall_patching(KVMState *s) > +{ > + int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); > + > + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { > + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + } > + > + warn_report("kvm: disabling hypercall patching not supported"); It's not clear it's 1) KVM doesn't support/has FIX_HYPERCALL_INSN quirk or 2) KVM has FIX_HYPERCALL_INSN quirk but doesn't allow it to be disabled, when KVM_X86_QUIRK_FIX_HYPERCALL_INSN is not returned in KVM_CAP_DISABLE_QUIRKS2. If it's case 1), it can be treated as hypercall patching is disabled thus no warning is expected. So, I think it requires a new cap in KVM to return the enabled quirks. > + return 0; I think return 0 here is to avoid the warn_report() in the caller. But for the correct semantics, we need to return -1 to indicate that it fails to disable the hypercall patching? > +} > + > int kvm_arch_init(MachineState *ms, KVMState *s) > { > int ret; > @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > } > > + if (s->hypercall_patching_enabled == false) { > + if (kvm_vm_disable_hypercall_patching(s)) { > + warn_report("kvm: failed to disable hypercall patching quirk"); > + } > + } > + > return 0; > } > > @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask) > } > } > > +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) > +{ > + KVMState *s = KVM_STATE(obj); > + return s->hypercall_patching_enabled; > +} > + > +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, > + Error **errp) > +{ > + KVMState *s = KVM_STATE(obj); > + s->hypercall_patching_enabled = value; > +} > + > static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) > { > KVMState *s = KVM_STATE(obj); > @@ -6589,6 +6621,12 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, > > void kvm_arch_accel_class_init(ObjectClass *oc) > { > + object_class_property_add_bool(oc, "hypercall-patching", > + kvm_arch_get_hypercall_patching, > + kvm_arch_set_hypercall_patching); > + object_class_property_set_description(oc, "hypercall-patching", > + "Enable hypercall patching quirk"); > + > object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption", > &NotifyVmexitOption_lookup, > kvm_arch_get_notify_vmexit, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 3:45 ` Xiaoyao Li @ 2025-07-22 9:21 ` Mathias Krause 2025-07-22 10:27 ` Xiaoyao Li 0 siblings, 1 reply; 14+ messages in thread From: Mathias Krause @ 2025-07-22 9:21 UTC (permalink / raw) To: Xiaoyao Li, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 22.07.25 05:45, Xiaoyao Li wrote: > On 6/20/2025 3:42 AM, Mathias Krause wrote: >> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >> exception (#UD) as they are just the wrong instruction for the CPU >> given. But instead of forwarding the exception to the guest, KVM tries >> to patch the guest instruction to match the host's actual hypercall >> instruction. That is doomed to fail as read-only code is rather the >> standard these days. But, instead of letting go the patching attempt and >> falling back to #UD injection, KVM injects the page fault instead. >> >> That's wrong on multiple levels. Not only isn't that a valid exception >> to be generated by these instructions, confusing attempts to handle >> them. It also destroys guest state by doing so, namely the value of CR2. >> >> Sean attempted to fix that in KVM[1] but the patch was never applied. >> >> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >> conceptually be disabled. Paolo even called out to add this very >> functionality to disable the quirk in QEMU[3]. So lets just do it. >> >> A new property 'hypercall-patching=on|off' is added, for the very >> unlikely case that there are setups that really need the patching. >> However, these would be vulnerable to memory corruption attacks freely >> overwriting code as they please. So, my guess is, there are exactly 0 >> systems out there requiring this quirk. > > The default behavior is patching the hypercall for many years. > > If you desire to change the default behavior, please at least keep it > unchanged for old machine version. i.e., introduce compat_property, > which sets KVMState->hypercall_patching_enabled to true. Well, the thing is, KVM's patching is done with the effective permissions of the guest which means, if the code in question isn't writable from the guest's point of view, KVM's attempt to modify it will fail. This failure isn't transparent for the guest as it sees a #PF instead of a #UD, and that's what I'm trying to fix by disabling the quirk. The hypercall patching was introduced in Linux commit 7aa81cc04781 ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then it was based on a dedicated hypercall page that was handled by KVM to use the proper instruction of the KVM module in use (VMX or SVM). Patching code was fine back then, but the introduction of DEBUG_RO_DATA made the patching attempts fail and, ultimately, lead to Paolo handle this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only"). However, his change still doesn't account for the cross-vendor live migration case (Intel<->AMD), which will still be broken, causing the before mentioned bogus #PF, which will just lead to misleading Oops reports, confusing the poor souls, trying to make sense of it. IMHO, there is no valid reason for still having the patching in place as the .text of non-ancient kernel's will be write-protected, making patching attempts fail. And, as they fail with a #PF instead of #UD, the guest cannot even handle them appropriately, as there was no memory write attempt from its point of view. Therefore the default should be to disable it, IMO. This won't prevent guests making use of the wrong instruction from trapping, but, at least, now they'll get the correct exception vector and can handle it appropriately. > >> [1] https://lore.kernel.org/kvm/20211210222903.3417968-1- >> seanjc@google.com/ >> [2] https://lore.kernel.org/kvm/20220316005538.2282772-2- >> oupton@google.com/ >> [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665- >> c00e4fe9cb9c@redhat.com/ >> >> Cc: Oliver Upton <oliver.upton@linux.dev> >> Cc: Sean Christopherson <seanjc@google.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Mathias Krause <minipli@grsecurity.net> >> --- >> include/system/kvm_int.h | 1 + >> qemu-options.hx | 10 ++++++++++ >> target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 49 insertions(+) >> >> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h >> index 756a3c0a250e..fd7129824429 100644 >> --- a/include/system/kvm_int.h >> +++ b/include/system/kvm_int.h >> @@ -159,6 +159,7 @@ struct KVMState >> uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk >> size */ >> struct KVMDirtyRingReaper reaper; >> struct KVMMsrEnergy msr_energy; >> + bool hypercall_patching_enabled; > > IMHO, we can just name it "hypercall_patching". > > Since it's a boolean type, true means enabled and false means disabled. Ok, makes sense. > >> NotifyVmexitOption notify_vmexit; >> uint32_t notify_window; >> uint32_t xen_version; >> diff --git a/qemu-options.hx b/qemu-options.hx >> index 1f862b19a676..c2e232649c19 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, >> " dirty-ring-size=n (KVM dirty ring GFN count, >> default 0)\n" >> " eager-split-size=n (KVM Eager Page Split chunk >> size, default 0, disabled. ARM only)\n" >> " notify-vmexit=run|internal-error| >> disable,notify-window=n (enable notify VM exit and set notify window, >> x86 only)\n" >> + " hypercall-patching=on|off (enable KVM's VMCALL/ >> VMMCALL hypercall patching quirk, x86 only)\n" >> " thread=single|multi (enable multi-threaded TCG)\n" >> " device=path (KVM device path, default /dev/ >> kvm)\n", QEMU_ARCH_ALL) >> SRST >> @@ -313,6 +314,15 @@ SRST >> open up for a specified of time (i.e. notify-window). >> Default: notify-vmexit=run,notify-window=0. >> + ``hypercall-patching=on|off`` >> + KVM tries to recover from the wrong hypercall instruction >> being used by >> + a guest by attempting to rewrite it to the one supported >> natively by >> + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). >> However, this >> + patching may fail if the guest memory is write protected, >> leading to a >> + page fault getting propagated to the guest instead of an illegal >> + instruction exception. As this may confuse guests, it gets >> disabled by >> + default (x86 only). >> + >> ``device=path`` >> Sets the path to the KVM device node. Defaults to ``/dev/ >> kvm``. This >> option can be used to pass the KVM device to use via a file >> descriptor >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> index 56a6b9b6381a..6f5f3b95e553 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) >> return 0; >> } >> +static int kvm_vm_disable_hypercall_patching(KVMState *s) >> +{ >> + int valid_quirks = kvm_vm_check_extension(s, >> KVM_CAP_DISABLE_QUIRKS2); >> + >> + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { >> + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, >> + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); >> + } >> + >> + warn_report("kvm: disabling hypercall patching not supported"); > > It's not clear it's 1) KVM doesn't support/has FIX_HYPERCALL_INSN quirk > or 2) KVM has FIX_HYPERCALL_INSN quirk but doesn't allow it to be > disabled, when KVM_X86_QUIRK_FIX_HYPERCALL_INSN is not returned in > KVM_CAP_DISABLE_QUIRKS2. > > If it's case 1), it can be treated as hypercall patching is disabled > thus no warning is expected. > > So, I think it requires a new cap in KVM to return the enabled quirks. KVM_CAP_DISABLE_QUIRKS2 fixes that bug of KVM_CAP_DISABLE_QUIRKS by doing just that: returning the mask of supported quirks when queried via KVM_CHECK_EXTENSION. So if KVM_X86_QUIRK_FIX_HYPERCALL_INSN is in that mask, it can also be disabled. If that attempt fails (for whatever reason), it's an error, which makes kvm_vm_enable_cap() return a non-zero value, triggering the warn_report("kvm: failed to disable hypercall patching quirk") in the caller. If KVM_X86_QUIRK_FIX_HYPERCALL_INSN is missing in the KVM_CAP_DISABLE_QUIRKS2 mask, it may either be that KVM is too old to even have the hypercall patching (pre-v2.6.25) or does do the patching, just doesn't have KVM_X86_QUIRK_FIX_HYPERCALL_INSN yet, which came in Linux commit f1a9761fbb00 ("KVM: x86: Allow userspace to opt out of hypercall patching"), which is v5.19. Ignoring pre-v2.6.25 kernels for a moment, we can assume that KVM will do the patching. So the lack of KVM_X86_QUIRK_FIX_HYPERCALL_INSN but having 'hypercall_patching_enabled == false' indicates that the user wants to disable it but QEMU cannot do so, because KVM lacks the extension to do so. This, IMO, legitimizes the warn_report("kvm: disabling hypercall patching not supported") -- as it's not supported. > >> + return 0; > > I think return 0 here is to avoid the warn_report() in the caller. But > for the correct semantics, we need to return -1 to indicate that it > fails to disable the hypercall patching? No, returning 0 here is very much on purpose, as you noticed, to avoid the warn_report() in the caller. The already issued warn_report() is the correct one for this case. I guess, it's a question of if disabling hypercall patching is a hard requirement or can soft-fail. I decided for the latter, as hypercall patching shouldn't be needed in most cases, so if it cannot be disabled, it's mostly fine to start the VM still. > >> +} >> + >> int kvm_arch_init(MachineState *ms, KVMState *s) >> { >> int ret; >> @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) >> } >> } >> + if (s->hypercall_patching_enabled == false) { >> + if (kvm_vm_disable_hypercall_patching(s)) { >> + warn_report("kvm: failed to disable hypercall patching >> quirk"); >> + } >> + } >> + >> return 0; >> } >> @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU >> *cpu, uint64_t mask) >> } >> } >> +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) >> +{ >> + KVMState *s = KVM_STATE(obj); >> + return s->hypercall_patching_enabled; >> +} >> + >> +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, >> + Error **errp) >> +{ >> + KVMState *s = KVM_STATE(obj); >> + s->hypercall_patching_enabled = value; >> +} >> + >> static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) >> { >> KVMState *s = KVM_STATE(obj); >> @@ -6589,6 +6621,12 @@ static void >> kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, >> void kvm_arch_accel_class_init(ObjectClass *oc) >> { >> + object_class_property_add_bool(oc, "hypercall-patching", >> + kvm_arch_get_hypercall_patching, >> + kvm_arch_set_hypercall_patching); >> + object_class_property_set_description(oc, "hypercall-patching", >> + "Enable hypercall patching >> quirk"); >> + >> object_class_property_add_enum(oc, "notify-vmexit", >> "NotifyVMexitOption", >> &NotifyVmexitOption_lookup, >> kvm_arch_get_notify_vmexit, > Thanks, Mathias ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 9:21 ` Mathias Krause @ 2025-07-22 10:27 ` Xiaoyao Li 2025-07-22 10:35 ` Mohamed Mediouni ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Xiaoyao Li @ 2025-07-22 10:27 UTC (permalink / raw) To: Mathias Krause, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 7/22/2025 5:21 PM, Mathias Krause wrote: > On 22.07.25 05:45, Xiaoyao Li wrote: >> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>> exception (#UD) as they are just the wrong instruction for the CPU >>> given. But instead of forwarding the exception to the guest, KVM tries >>> to patch the guest instruction to match the host's actual hypercall >>> instruction. That is doomed to fail as read-only code is rather the >>> standard these days. But, instead of letting go the patching attempt and >>> falling back to #UD injection, KVM injects the page fault instead. >>> >>> That's wrong on multiple levels. Not only isn't that a valid exception >>> to be generated by these instructions, confusing attempts to handle >>> them. It also destroys guest state by doing so, namely the value of CR2. >>> >>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>> >>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>> conceptually be disabled. Paolo even called out to add this very >>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>> >>> A new property 'hypercall-patching=on|off' is added, for the very >>> unlikely case that there are setups that really need the patching. >>> However, these would be vulnerable to memory corruption attacks freely >>> overwriting code as they please. So, my guess is, there are exactly 0 >>> systems out there requiring this quirk. >> >> The default behavior is patching the hypercall for many years. >> >> If you desire to change the default behavior, please at least keep it >> unchanged for old machine version. i.e., introduce compat_property, >> which sets KVMState->hypercall_patching_enabled to true. > > Well, the thing is, KVM's patching is done with the effective > permissions of the guest which means, if the code in question isn't > writable from the guest's point of view, KVM's attempt to modify it will > fail. This failure isn't transparent for the guest as it sees a #PF > instead of a #UD, and that's what I'm trying to fix by disabling the quirk. > > The hypercall patching was introduced in Linux commit 7aa81cc04781 > ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then > it was based on a dedicated hypercall page that was handled by KVM to > use the proper instruction of the KVM module in use (VMX or SVM). > > Patching code was fine back then, but the introduction of DEBUG_RO_DATA > made the patching attempts fail and, ultimately, lead to Paolo handle > this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL > vs. VMMCALL if kernel text is read-only"). > > However, his change still doesn't account for the cross-vendor live > migration case (Intel<->AMD), which will still be broken, causing the > before mentioned bogus #PF, which will just lead to misleading Oops > reports, confusing the poor souls, trying to make sense of it. > > IMHO, there is no valid reason for still having the patching in place as > the .text of non-ancient kernel's will be write-protected, making > patching attempts fail. And, as they fail with a #PF instead of #UD, the > guest cannot even handle them appropriately, as there was no memory > write attempt from its point of view. Therefore the default should be to > disable it, IMO. This won't prevent guests making use of the wrong > instruction from trapping, but, at least, now they'll get the correct > exception vector and can handle it appropriately. But you don't accout for the case that guest kernel is built without CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for whatever reason the guest's text is not readonly, and the VM needs to be migrated among different vendors (Intel <-> AMD). Before this patch, the above usecase works well. But with this patch, the guest will gets #UD after migrated to different vendors. I heard from some small CSPs that they do want to the ability to live migrate VMs among Intel and AMD host. >> >>> [1] https://lore.kernel.org/kvm/20211210222903.3417968-1- >>> seanjc@google.com/ >>> [2] https://lore.kernel.org/kvm/20220316005538.2282772-2- >>> oupton@google.com/ >>> [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665- >>> c00e4fe9cb9c@redhat.com/ >>> >>> Cc: Oliver Upton <oliver.upton@linux.dev> >>> Cc: Sean Christopherson <seanjc@google.com> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Signed-off-by: Mathias Krause <minipli@grsecurity.net> >>> --- >>> include/system/kvm_int.h | 1 + >>> qemu-options.hx | 10 ++++++++++ >>> target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ >>> 3 files changed, 49 insertions(+) >>> >>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h >>> index 756a3c0a250e..fd7129824429 100644 >>> --- a/include/system/kvm_int.h >>> +++ b/include/system/kvm_int.h >>> @@ -159,6 +159,7 @@ struct KVMState >>> uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk >>> size */ >>> struct KVMDirtyRingReaper reaper; >>> struct KVMMsrEnergy msr_energy; >>> + bool hypercall_patching_enabled; >> >> IMHO, we can just name it "hypercall_patching". >> >> Since it's a boolean type, true means enabled and false means disabled. > > Ok, makes sense. > >> >>> NotifyVmexitOption notify_vmexit; >>> uint32_t notify_window; >>> uint32_t xen_version; >>> diff --git a/qemu-options.hx b/qemu-options.hx >>> index 1f862b19a676..c2e232649c19 100644 >>> --- a/qemu-options.hx >>> +++ b/qemu-options.hx >>> @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, >>> " dirty-ring-size=n (KVM dirty ring GFN count, >>> default 0)\n" >>> " eager-split-size=n (KVM Eager Page Split chunk >>> size, default 0, disabled. ARM only)\n" >>> " notify-vmexit=run|internal-error| >>> disable,notify-window=n (enable notify VM exit and set notify window, >>> x86 only)\n" >>> + " hypercall-patching=on|off (enable KVM's VMCALL/ >>> VMMCALL hypercall patching quirk, x86 only)\n" >>> " thread=single|multi (enable multi-threaded TCG)\n" >>> " device=path (KVM device path, default /dev/ >>> kvm)\n", QEMU_ARCH_ALL) >>> SRST >>> @@ -313,6 +314,15 @@ SRST >>> open up for a specified of time (i.e. notify-window). >>> Default: notify-vmexit=run,notify-window=0. >>> + ``hypercall-patching=on|off`` >>> + KVM tries to recover from the wrong hypercall instruction >>> being used by >>> + a guest by attempting to rewrite it to the one supported >>> natively by >>> + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). >>> However, this >>> + patching may fail if the guest memory is write protected, >>> leading to a >>> + page fault getting propagated to the guest instead of an illegal >>> + instruction exception. As this may confuse guests, it gets >>> disabled by >>> + default (x86 only). >>> + >>> ``device=path`` >>> Sets the path to the KVM device node. Defaults to ``/dev/ >>> kvm``. This >>> option can be used to pass the KVM device to use via a file >>> descriptor >>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>> index 56a6b9b6381a..6f5f3b95e553 100644 >>> --- a/target/i386/kvm/kvm.c >>> +++ b/target/i386/kvm/kvm.c >>> @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) >>> return 0; >>> } >>> +static int kvm_vm_disable_hypercall_patching(KVMState *s) >>> +{ >>> + int valid_quirks = kvm_vm_check_extension(s, >>> KVM_CAP_DISABLE_QUIRKS2); >>> + >>> + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { >>> + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, >>> + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); >>> + } >>> + >>> + warn_report("kvm: disabling hypercall patching not supported"); >> >> It's not clear it's 1) KVM doesn't support/has FIX_HYPERCALL_INSN quirk >> or 2) KVM has FIX_HYPERCALL_INSN quirk but doesn't allow it to be >> disabled, when KVM_X86_QUIRK_FIX_HYPERCALL_INSN is not returned in >> KVM_CAP_DISABLE_QUIRKS2. >> >> If it's case 1), it can be treated as hypercall patching is disabled >> thus no warning is expected. >> >> So, I think it requires a new cap in KVM to return the enabled quirks. > > KVM_CAP_DISABLE_QUIRKS2 fixes that bug of KVM_CAP_DISABLE_QUIRKS by > doing just that: returning the mask of supported quirks when queried via > KVM_CHECK_EXTENSION. So if KVM_X86_QUIRK_FIX_HYPERCALL_INSN is in that > mask, it can also be disabled. If that attempt fails (for whatever > reason), it's an error, which makes kvm_vm_enable_cap() return a > non-zero value, triggering the warn_report("kvm: failed to disable > hypercall patching quirk") in the caller. > > If KVM_X86_QUIRK_FIX_HYPERCALL_INSN is missing in the > KVM_CAP_DISABLE_QUIRKS2 mask, it may either be that KVM is too old to > even have the hypercall patching (pre-v2.6.25) or does do the patching, > just doesn't have KVM_X86_QUIRK_FIX_HYPERCALL_INSN yet, which came in > Linux commit f1a9761fbb00 ("KVM: x86: Allow userspace to opt out of > hypercall patching"), which is v5.19. > > Ignoring pre-v2.6.25 kernels for a moment, we can assume that KVM will > do the patching. So the lack of KVM_X86_QUIRK_FIX_HYPERCALL_INSN but > having 'hypercall_patching_enabled == false' indicates that the user > wants to disable it but QEMU cannot do so, because KVM lacks the > extension to do so. This, IMO, legitimizes the warn_report("kvm: > disabling hypercall patching not supported") -- as it's not supported. The minimum supported kernel version is 4.5 (see commit f180e367fce4). So pre-v2.6.25 kernels is not the case. Surely we can list all the cases of different versions of KVM starting from v4.5 and draw the conclusion that the semantics of "valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN == 0" means KVM enables the hypercall patching quirk but don't provide the interface for userspace to disable it. So the code logic is correct. My statement of "I think it requires a new cap in KVM to return the enabled quirks" is more for generic consideration. i.e., QEMU can know whether a quirk is enabled or not without analysing the detailed history of KVM. Of course, it's more of the requirement on KVM to provide new interface and Current QEMU can do nothing on it. That is, current implementation of this PATCH is OK to me. >> >>> + return 0; >> >> I think return 0 here is to avoid the warn_report() in the caller. But >> for the correct semantics, we need to return -1 to indicate that it >> fails to disable the hypercall patching? > > No, returning 0 here is very much on purpose, as you noticed, to avoid > the warn_report() in the caller. The already issued warn_report() is the > correct one for this case. We can use @Error to pass the error log instead of the trick on return value. e.g., --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -3228,17 +3228,24 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) return 0; } -static int kvm_vm_disable_hypercall_patching(KVMState *s) +static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp) { int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); + int ret = -1; if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { - return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, - KVM_X86_QUIRK_FIX_HYPERCALL_INSN); + ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); + if (ret) { + error_setg_errno(errp, -ret, "kvm: failed to disable " + "hypercall patching quirk: %s", + strerror(-ret)); + } + } else { + error_setg(errp, "kvm: disabling hypercall patching not supported"); } - warn_report("kvm: disabling hypercall patching not supported"); - return 0; + return ret; } int kvm_arch_init(MachineState *ms, KVMState *s) @@ -3381,8 +3388,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } if (s->hypercall_patching_enabled == false) { - if (kvm_vm_disable_hypercall_patching(s)) { - warn_report("kvm: failed to disable hypercall patching quirk"); + if (kvm_vm_disable_hypercall_patching(s, &local_err)) { + error_report_err(local_err); } } > I guess, it's a question of if disabling hypercall patching is a hard > requirement or can soft-fail. I decided for the latter, as hypercall > patching shouldn't be needed in most cases, so if it cannot be disabled, > it's mostly fine to start the VM still. > >> >>> +} >>> + >>> int kvm_arch_init(MachineState *ms, KVMState *s) >>> { >>> int ret; >>> @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) >>> } >>> } >>> + if (s->hypercall_patching_enabled == false) { >>> + if (kvm_vm_disable_hypercall_patching(s)) { >>> + warn_report("kvm: failed to disable hypercall patching >>> quirk"); >>> + } >>> + } >>> + >>> return 0; >>> } >>> @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU >>> *cpu, uint64_t mask) >>> } >>> } >>> +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) >>> +{ >>> + KVMState *s = KVM_STATE(obj); >>> + return s->hypercall_patching_enabled; >>> +} >>> + >>> +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, >>> + Error **errp) >>> +{ >>> + KVMState *s = KVM_STATE(obj); >>> + s->hypercall_patching_enabled = value; >>> +} >>> + >>> static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) >>> { >>> KVMState *s = KVM_STATE(obj); >>> @@ -6589,6 +6621,12 @@ static void >>> kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, >>> void kvm_arch_accel_class_init(ObjectClass *oc) >>> { >>> + object_class_property_add_bool(oc, "hypercall-patching", >>> + kvm_arch_get_hypercall_patching, >>> + kvm_arch_set_hypercall_patching); >>> + object_class_property_set_description(oc, "hypercall-patching", >>> + "Enable hypercall patching >>> quirk"); >>> + >>> object_class_property_add_enum(oc, "notify-vmexit", >>> "NotifyVMexitOption", >>> &NotifyVmexitOption_lookup, >>> kvm_arch_get_notify_vmexit, >> > > Thanks, > Mathias ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 10:27 ` Xiaoyao Li @ 2025-07-22 10:35 ` Mohamed Mediouni 2025-07-22 11:06 ` Xiaoyao Li 2025-07-22 11:15 ` Daniel P. Berrangé 2025-07-22 20:08 ` Mathias Krause 2 siblings, 1 reply; 14+ messages in thread From: Mohamed Mediouni @ 2025-07-22 10:35 UTC (permalink / raw) To: Xiaoyao Li Cc: Mathias Krause, qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson > On 22. Jul 2025, at 12:27, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 7/22/2025 5:21 PM, Mathias Krause wrote: >> On 22.07.25 05:45, Xiaoyao Li wrote: >>> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>>> exception (#UD) as they are just the wrong instruction for the CPU >>>> given. But instead of forwarding the exception to the guest, KVM tries >>>> to patch the guest instruction to match the host's actual hypercall >>>> instruction. That is doomed to fail as read-only code is rather the >>>> standard these days. But, instead of letting go the patching attempt and >>>> falling back to #UD injection, KVM injects the page fault instead. >>>> >>>> That's wrong on multiple levels. Not only isn't that a valid exception >>>> to be generated by these instructions, confusing attempts to handle >>>> them. It also destroys guest state by doing so, namely the value of CR2. >>>> >>>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>>> >>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>>> conceptually be disabled. Paolo even called out to add this very >>>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>>> >>>> A new property 'hypercall-patching=on|off' is added, for the very >>>> unlikely case that there are setups that really need the patching. >>>> However, these would be vulnerable to memory corruption attacks freely >>>> overwriting code as they please. So, my guess is, there are exactly 0 >>>> systems out there requiring this quirk. >>> >>> The default behavior is patching the hypercall for many years. >>> >>> If you desire to change the default behavior, please at least keep it >>> unchanged for old machine version. i.e., introduce compat_property, >>> which sets KVMState->hypercall_patching_enabled to true. >> Well, the thing is, KVM's patching is done with the effective >> permissions of the guest which means, if the code in question isn't >> writable from the guest's point of view, KVM's attempt to modify it will >> fail. This failure isn't transparent for the guest as it sees a #PF >> instead of a #UD, and that's what I'm trying to fix by disabling the quirk. >> The hypercall patching was introduced in Linux commit 7aa81cc04781 >> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then >> it was based on a dedicated hypercall page that was handled by KVM to >> use the proper instruction of the KVM module in use (VMX or SVM). >> Patching code was fine back then, but the introduction of DEBUG_RO_DATA >> made the patching attempts fail and, ultimately, lead to Paolo handle >> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL >> vs. VMMCALL if kernel text is read-only"). >> However, his change still doesn't account for the cross-vendor live >> migration case (Intel<->AMD), which will still be broken, causing the >> before mentioned bogus #PF, which will just lead to misleading Oops >> reports, confusing the poor souls, trying to make sense of it. >> IMHO, there is no valid reason for still having the patching in place as >> the .text of non-ancient kernel's will be write-protected, making >> patching attempts fail. And, as they fail with a #PF instead of #UD, the >> guest cannot even handle them appropriately, as there was no memory >> write attempt from its point of view. Therefore the default should be to >> disable it, IMO. This won't prevent guests making use of the wrong >> instruction from trapping, but, at least, now they'll get the correct >> exception vector and can handle it appropriately. > > But you don't accout for the case that guest kernel is built without CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for whatever reason the guest's text is not readonly, and the VM needs to be migrated among different vendors (Intel <-> AMD). > > Before this patch, the above usecase works well. But with this patch, the guest will gets #UD after migrated to different vendors. > > I heard from some small CSPs that they do want to the ability to live migrate VMs among Intel and AMD host. > Is there a particular reason to not handle that #UD as a hypercall to support that scenario? Especially with some guest OS kernels having kernel patch protection with periodic scrubbing of r/o memory (ie Windows), doesn’t sound great to override anything in a way the guest OS kernel can notice… -Mohamed >>> >>>> [1] https://lore.kernel.org/kvm/20211210222903.3417968-1- >>>> seanjc@google.com/ >>>> [2] https://lore.kernel.org/kvm/20220316005538.2282772-2- >>>> oupton@google.com/ >>>> [3] https://lore.kernel.org/kvm/80e1f1d2-2d79-22b7-6665- >>>> c00e4fe9cb9c@redhat.com/ >>>> >>>> Cc: Oliver Upton <oliver.upton@linux.dev> >>>> Cc: Sean Christopherson <seanjc@google.com> >>>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>>> Signed-off-by: Mathias Krause <minipli@grsecurity.net> >>>> --- >>>> include/system/kvm_int.h | 1 + >>>> qemu-options.hx | 10 ++++++++++ >>>> target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 49 insertions(+) >>>> >>>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h >>>> index 756a3c0a250e..fd7129824429 100644 >>>> --- a/include/system/kvm_int.h >>>> +++ b/include/system/kvm_int.h >>>> @@ -159,6 +159,7 @@ struct KVMState >>>> uint64_t kvm_eager_split_size; /* Eager Page Splitting chunk >>>> size */ >>>> struct KVMDirtyRingReaper reaper; >>>> struct KVMMsrEnergy msr_energy; >>>> + bool hypercall_patching_enabled; >>> >>> IMHO, we can just name it "hypercall_patching". >>> >>> Since it's a boolean type, true means enabled and false means disabled. >> Ok, makes sense. >>> >>>> NotifyVmexitOption notify_vmexit; >>>> uint32_t notify_window; >>>> uint32_t xen_version; >>>> diff --git a/qemu-options.hx b/qemu-options.hx >>>> index 1f862b19a676..c2e232649c19 100644 >>>> --- a/qemu-options.hx >>>> +++ b/qemu-options.hx >>>> @@ -231,6 +231,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel, >>>> " dirty-ring-size=n (KVM dirty ring GFN count, >>>> default 0)\n" >>>> " eager-split-size=n (KVM Eager Page Split chunk >>>> size, default 0, disabled. ARM only)\n" >>>> " notify-vmexit=run|internal-error| >>>> disable,notify-window=n (enable notify VM exit and set notify window, >>>> x86 only)\n" >>>> + " hypercall-patching=on|off (enable KVM's VMCALL/ >>>> VMMCALL hypercall patching quirk, x86 only)\n" >>>> " thread=single|multi (enable multi-threaded TCG)\n" >>>> " device=path (KVM device path, default /dev/ >>>> kvm)\n", QEMU_ARCH_ALL) >>>> SRST >>>> @@ -313,6 +314,15 @@ SRST >>>> open up for a specified of time (i.e. notify-window). >>>> Default: notify-vmexit=run,notify-window=0. >>>> + ``hypercall-patching=on|off`` >>>> + KVM tries to recover from the wrong hypercall instruction >>>> being used by >>>> + a guest by attempting to rewrite it to the one supported >>>> natively by >>>> + the host CPU (VMCALL on Intel, VMMCALL for AMD systems). >>>> However, this >>>> + patching may fail if the guest memory is write protected, >>>> leading to a >>>> + page fault getting propagated to the guest instead of an illegal >>>> + instruction exception. As this may confuse guests, it gets >>>> disabled by >>>> + default (x86 only). >>>> + >>>> ``device=path`` >>>> Sets the path to the KVM device node. Defaults to ``/dev/ >>>> kvm``. This >>>> option can be used to pass the KVM device to use via a file >>>> descriptor >>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>> index 56a6b9b6381a..6f5f3b95e553 100644 >>>> --- a/target/i386/kvm/kvm.c >>>> +++ b/target/i386/kvm/kvm.c >>>> @@ -3224,6 +3224,19 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) >>>> return 0; >>>> } >>>> +static int kvm_vm_disable_hypercall_patching(KVMState *s) >>>> +{ >>>> + int valid_quirks = kvm_vm_check_extension(s, >>>> KVM_CAP_DISABLE_QUIRKS2); >>>> + >>>> + if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { >>>> + return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, >>>> + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); >>>> + } >>>> + >>>> + warn_report("kvm: disabling hypercall patching not supported"); >>> >>> It's not clear it's 1) KVM doesn't support/has FIX_HYPERCALL_INSN quirk >>> or 2) KVM has FIX_HYPERCALL_INSN quirk but doesn't allow it to be >>> disabled, when KVM_X86_QUIRK_FIX_HYPERCALL_INSN is not returned in >>> KVM_CAP_DISABLE_QUIRKS2. >>> >>> If it's case 1), it can be treated as hypercall patching is disabled >>> thus no warning is expected. >>> >>> So, I think it requires a new cap in KVM to return the enabled quirks. >> KVM_CAP_DISABLE_QUIRKS2 fixes that bug of KVM_CAP_DISABLE_QUIRKS by >> doing just that: returning the mask of supported quirks when queried via >> KVM_CHECK_EXTENSION. So if KVM_X86_QUIRK_FIX_HYPERCALL_INSN is in that >> mask, it can also be disabled. If that attempt fails (for whatever >> reason), it's an error, which makes kvm_vm_enable_cap() return a >> non-zero value, triggering the warn_report("kvm: failed to disable >> hypercall patching quirk") in the caller. >> If KVM_X86_QUIRK_FIX_HYPERCALL_INSN is missing in the >> KVM_CAP_DISABLE_QUIRKS2 mask, it may either be that KVM is too old to >> even have the hypercall patching (pre-v2.6.25) or does do the patching, >> just doesn't have KVM_X86_QUIRK_FIX_HYPERCALL_INSN yet, which came in >> Linux commit f1a9761fbb00 ("KVM: x86: Allow userspace to opt out of >> hypercall patching"), which is v5.19. >> Ignoring pre-v2.6.25 kernels for a moment, we can assume that KVM will >> do the patching. So the lack of KVM_X86_QUIRK_FIX_HYPERCALL_INSN but >> having 'hypercall_patching_enabled == false' indicates that the user >> wants to disable it but QEMU cannot do so, because KVM lacks the >> extension to do so. This, IMO, legitimizes the warn_report("kvm: >> disabling hypercall patching not supported") -- as it's not supported. > > The minimum supported kernel version is 4.5 (see commit f180e367fce4). > So pre-v2.6.25 kernels is not the case. > > Surely we can list all the cases of different versions of KVM starting from v4.5 and draw the conclusion that the semantics of "valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN == 0" means KVM enables the hypercall patching quirk but don't provide the interface for userspace to disable it. So the code logic is correct. > > My statement of "I think it requires a new cap in KVM to return the enabled quirks" is more for generic consideration. i.e., QEMU can know whether a quirk is enabled or not without analysing the detailed history of KVM. > > Of course, it's more of the requirement on KVM to provide new interface and Current QEMU can do nothing on it. That is, current implementation of this PATCH is OK to me. > >>> >>>> + return 0; >>> >>> I think return 0 here is to avoid the warn_report() in the caller. But >>> for the correct semantics, we need to return -1 to indicate that it >>> fails to disable the hypercall patching? >> No, returning 0 here is very much on purpose, as you noticed, to avoid >> the warn_report() in the caller. The already issued warn_report() is the >> correct one for this case. > > We can use @Error to pass the error log instead of the trick on return value. > > e.g., > > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3228,17 +3228,24 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) > return 0; > } > > -static int kvm_vm_disable_hypercall_patching(KVMState *s) > +static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp) > { > int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); > + int ret = -1; > > if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { > - return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > - KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + if (ret) { > + error_setg_errno(errp, -ret, "kvm: failed to disable " > + "hypercall patching quirk: %s", > + strerror(-ret)); > + } > + } else { > + error_setg(errp, "kvm: disabling hypercall patching not supported"); > } > > - warn_report("kvm: disabling hypercall patching not supported"); > - return 0; > + return ret; > } > > int kvm_arch_init(MachineState *ms, KVMState *s) > @@ -3381,8 +3388,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > > if (s->hypercall_patching_enabled == false) { > - if (kvm_vm_disable_hypercall_patching(s)) { > - warn_report("kvm: failed to disable hypercall patching quirk"); > + if (kvm_vm_disable_hypercall_patching(s, &local_err)) { > + error_report_err(local_err); > } > } > >> I guess, it's a question of if disabling hypercall patching is a hard >> requirement or can soft-fail. I decided for the latter, as hypercall >> patching shouldn't be needed in most cases, so if it cannot be disabled, >> it's mostly fine to start the VM still. >>> >>>> +} >>>> + >>>> int kvm_arch_init(MachineState *ms, KVMState *s) >>>> { >>>> int ret; >>>> @@ -3363,6 +3376,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s) >>>> } >>>> } >>>> + if (s->hypercall_patching_enabled == false) { >>>> + if (kvm_vm_disable_hypercall_patching(s)) { >>>> + warn_report("kvm: failed to disable hypercall patching >>>> quirk"); >>>> + } >>>> + } >>>> + >>>> return 0; >>>> } >>>> @@ -6456,6 +6475,19 @@ void kvm_request_xsave_components(X86CPU >>>> *cpu, uint64_t mask) >>>> } >>>> } >>>> +static bool kvm_arch_get_hypercall_patching(Object *obj, Error **errp) >>>> +{ >>>> + KVMState *s = KVM_STATE(obj); >>>> + return s->hypercall_patching_enabled; >>>> +} >>>> + >>>> +static void kvm_arch_set_hypercall_patching(Object *obj, bool value, >>>> + Error **errp) >>>> +{ >>>> + KVMState *s = KVM_STATE(obj); >>>> + s->hypercall_patching_enabled = value; >>>> +} >>>> + >>>> static int kvm_arch_get_notify_vmexit(Object *obj, Error **errp) >>>> { >>>> KVMState *s = KVM_STATE(obj); >>>> @@ -6589,6 +6621,12 @@ static void >>>> kvm_arch_set_xen_evtchn_max_pirq(Object *obj, Visitor *v, >>>> void kvm_arch_accel_class_init(ObjectClass *oc) >>>> { >>>> + object_class_property_add_bool(oc, "hypercall-patching", >>>> + kvm_arch_get_hypercall_patching, >>>> + kvm_arch_set_hypercall_patching); >>>> + object_class_property_set_description(oc, "hypercall-patching", >>>> + "Enable hypercall patching >>>> quirk"); >>>> + >>>> object_class_property_add_enum(oc, "notify-vmexit", >>>> "NotifyVMexitOption", >>>> &NotifyVmexitOption_lookup, >>>> kvm_arch_get_notify_vmexit, >>> >> Thanks, >> Mathias > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 10:35 ` Mohamed Mediouni @ 2025-07-22 11:06 ` Xiaoyao Li 2025-07-22 11:16 ` Mohamed Mediouni 0 siblings, 1 reply; 14+ messages in thread From: Xiaoyao Li @ 2025-07-22 11:06 UTC (permalink / raw) To: Mohamed Mediouni Cc: Mathias Krause, qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 7/22/2025 6:35 PM, Mohamed Mediouni wrote: >> On 22. Jul 2025, at 12:27, Xiaoyao Li<xiaoyao.li@intel.com> wrote: >> >> On 7/22/2025 5:21 PM, Mathias Krause wrote: >>> On 22.07.25 05:45, Xiaoyao Li wrote: >>>> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>>>> exception (#UD) as they are just the wrong instruction for the CPU >>>>> given. But instead of forwarding the exception to the guest, KVM tries >>>>> to patch the guest instruction to match the host's actual hypercall >>>>> instruction. That is doomed to fail as read-only code is rather the >>>>> standard these days. But, instead of letting go the patching attempt and >>>>> falling back to #UD injection, KVM injects the page fault instead. >>>>> >>>>> That's wrong on multiple levels. Not only isn't that a valid exception >>>>> to be generated by these instructions, confusing attempts to handle >>>>> them. It also destroys guest state by doing so, namely the value of CR2. >>>>> >>>>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>>>> >>>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>>>> conceptually be disabled. Paolo even called out to add this very >>>>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>>>> >>>>> A new property 'hypercall-patching=on|off' is added, for the very >>>>> unlikely case that there are setups that really need the patching. >>>>> However, these would be vulnerable to memory corruption attacks freely >>>>> overwriting code as they please. So, my guess is, there are exactly 0 >>>>> systems out there requiring this quirk. >>>> The default behavior is patching the hypercall for many years. >>>> >>>> If you desire to change the default behavior, please at least keep it >>>> unchanged for old machine version. i.e., introduce compat_property, >>>> which sets KVMState->hypercall_patching_enabled to true. >>> Well, the thing is, KVM's patching is done with the effective >>> permissions of the guest which means, if the code in question isn't >>> writable from the guest's point of view, KVM's attempt to modify it will >>> fail. This failure isn't transparent for the guest as it sees a #PF >>> instead of a #UD, and that's what I'm trying to fix by disabling the quirk. >>> The hypercall patching was introduced in Linux commit 7aa81cc04781 >>> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then >>> it was based on a dedicated hypercall page that was handled by KVM to >>> use the proper instruction of the KVM module in use (VMX or SVM). >>> Patching code was fine back then, but the introduction of DEBUG_RO_DATA >>> made the patching attempts fail and, ultimately, lead to Paolo handle >>> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL >>> vs. VMMCALL if kernel text is read-only"). >>> However, his change still doesn't account for the cross-vendor live >>> migration case (Intel<->AMD), which will still be broken, causing the >>> before mentioned bogus #PF, which will just lead to misleading Oops >>> reports, confusing the poor souls, trying to make sense of it. >>> IMHO, there is no valid reason for still having the patching in place as >>> the .text of non-ancient kernel's will be write-protected, making >>> patching attempts fail. And, as they fail with a #PF instead of #UD, the >>> guest cannot even handle them appropriately, as there was no memory >>> write attempt from its point of view. Therefore the default should be to >>> disable it, IMO. This won't prevent guests making use of the wrong >>> instruction from trapping, but, at least, now they'll get the correct >>> exception vector and can handle it appropriately. >> But you don't accout for the case that guest kernel is built without CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for whatever reason the guest's text is not readonly, and the VM needs to be migrated among different vendors (Intel <-> AMD). >> >> Before this patch, the above usecase works well. But with this patch, the guest will gets #UD after migrated to different vendors. >> >> I heard from some small CSPs that they do want to the ability to live migrate VMs among Intel and AMD host. >> > Is there a particular reason to not handle that #UD as a hypercall to support that scenario? do you mean KVM handles the first hardware #UD due to the wrong opcode of hypercall by directly emulate the hypercall instead of going to emulator to patch the guest memory with modifying the guest memory? If so, I agree with it. I thought the same solution and had no bandwidth and motivation to raise the idea to KVM community. > Especially with some guest OS kernels having kernel patch protection with periodic scrubbing of r/o memory (ie Windows), doesn’t sound great to override anything in a way the guest OS kernel can notice… ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 11:06 ` Xiaoyao Li @ 2025-07-22 11:16 ` Mohamed Mediouni 0 siblings, 0 replies; 14+ messages in thread From: Mohamed Mediouni @ 2025-07-22 11:16 UTC (permalink / raw) To: Xiaoyao Li Cc: Mathias Krause, qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson > On 22. Jul 2025, at 13:06, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 7/22/2025 6:35 PM, Mohamed Mediouni wrote: >>> On 22. Jul 2025, at 12:27, Xiaoyao Li<xiaoyao.li@intel.com> wrote: >>> >>> On 7/22/2025 5:21 PM, Mathias Krause wrote: >>>> On 22.07.25 05:45, Xiaoyao Li wrote: >>>>> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>>>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>>>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>>>>> exception (#UD) as they are just the wrong instruction for the CPU >>>>>> given. But instead of forwarding the exception to the guest, KVM tries >>>>>> to patch the guest instruction to match the host's actual hypercall >>>>>> instruction. That is doomed to fail as read-only code is rather the >>>>>> standard these days. But, instead of letting go the patching attempt and >>>>>> falling back to #UD injection, KVM injects the page fault instead. >>>>>> >>>>>> That's wrong on multiple levels. Not only isn't that a valid exception >>>>>> to be generated by these instructions, confusing attempts to handle >>>>>> them. It also destroys guest state by doing so, namely the value of CR2. >>>>>> >>>>>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>>>>> >>>>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>>>>> conceptually be disabled. Paolo even called out to add this very >>>>>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>>>>> >>>>>> A new property 'hypercall-patching=on|off' is added, for the very >>>>>> unlikely case that there are setups that really need the patching. >>>>>> However, these would be vulnerable to memory corruption attacks freely >>>>>> overwriting code as they please. So, my guess is, there are exactly 0 >>>>>> systems out there requiring this quirk. >>>>> The default behavior is patching the hypercall for many years. >>>>> >>>>> If you desire to change the default behavior, please at least keep it >>>>> unchanged for old machine version. i.e., introduce compat_property, >>>>> which sets KVMState->hypercall_patching_enabled to true. >>>> Well, the thing is, KVM's patching is done with the effective >>>> permissions of the guest which means, if the code in question isn't >>>> writable from the guest's point of view, KVM's attempt to modify it will >>>> fail. This failure isn't transparent for the guest as it sees a #PF >>>> instead of a #UD, and that's what I'm trying to fix by disabling the quirk. >>>> The hypercall patching was introduced in Linux commit 7aa81cc04781 >>>> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then >>>> it was based on a dedicated hypercall page that was handled by KVM to >>>> use the proper instruction of the KVM module in use (VMX or SVM). >>>> Patching code was fine back then, but the introduction of DEBUG_RO_DATA >>>> made the patching attempts fail and, ultimately, lead to Paolo handle >>>> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL >>>> vs. VMMCALL if kernel text is read-only"). >>>> However, his change still doesn't account for the cross-vendor live >>>> migration case (Intel<->AMD), which will still be broken, causing the >>>> before mentioned bogus #PF, which will just lead to misleading Oops >>>> reports, confusing the poor souls, trying to make sense of it. >>>> IMHO, there is no valid reason for still having the patching in place as >>>> the .text of non-ancient kernel's will be write-protected, making >>>> patching attempts fail. And, as they fail with a #PF instead of #UD, the >>>> guest cannot even handle them appropriately, as there was no memory >>>> write attempt from its point of view. Therefore the default should be to >>>> disable it, IMO. This won't prevent guests making use of the wrong >>>> instruction from trapping, but, at least, now they'll get the correct >>>> exception vector and can handle it appropriately. >>> But you don't accout for the case that guest kernel is built without CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for whatever reason the guest's text is not readonly, and the VM needs to be migrated among different vendors (Intel <-> AMD). >>> >>> Before this patch, the above usecase works well. But with this patch, the guest will gets #UD after migrated to different vendors. >>> >>> I heard from some small CSPs that they do want to the ability to live migrate VMs among Intel and AMD host. >>> >> Is there a particular reason to not handle that #UD as a hypercall to support that scenario? > > do you mean KVM handles the first hardware #UD due to the wrong opcode of hypercall by directly emulate the hypercall instead of going to emulator to patch the guest memory with modifying the guest memory? > > If so, I agree with it. I thought the same solution and had no bandwidth and motivation to raise the idea to KVM community. > Yes, I think that’d be the best way to go in this case… -Mohamed >> Especially with some guest OS kernels having kernel patch protection with periodic scrubbing of r/o memory (ie Windows), doesn’t sound great to override anything in a way the guest OS kernel can notice… > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 10:27 ` Xiaoyao Li 2025-07-22 10:35 ` Mohamed Mediouni @ 2025-07-22 11:15 ` Daniel P. Berrangé 2025-07-22 12:21 ` Xiaoyao Li 2025-07-22 20:08 ` Mathias Krause 2 siblings, 1 reply; 14+ messages in thread From: Daniel P. Berrangé @ 2025-07-22 11:15 UTC (permalink / raw) To: Xiaoyao Li Cc: Mathias Krause, qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On Tue, Jul 22, 2025 at 06:27:45PM +0800, Xiaoyao Li wrote: > On 7/22/2025 5:21 PM, Mathias Krause wrote: > > On 22.07.25 05:45, Xiaoyao Li wrote: > > > On 6/20/2025 3:42 AM, Mathias Krause wrote: > > > > KVM has a weird behaviour when a guest executes VMCALL on an AMD system > > > > or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode > > > > exception (#UD) as they are just the wrong instruction for the CPU > > > > given. But instead of forwarding the exception to the guest, KVM tries > > > > to patch the guest instruction to match the host's actual hypercall > > > > instruction. That is doomed to fail as read-only code is rather the > > > > standard these days. But, instead of letting go the patching attempt and > > > > falling back to #UD injection, KVM injects the page fault instead. > > > > > > > > That's wrong on multiple levels. Not only isn't that a valid exception > > > > to be generated by these instructions, confusing attempts to handle > > > > them. It also destroys guest state by doing so, namely the value of CR2. > > > > > > > > Sean attempted to fix that in KVM[1] but the patch was never applied. > > > > > > > > Later, Oliver added a quirk bit in [2] so the behaviour can, at least, > > > > conceptually be disabled. Paolo even called out to add this very > > > > functionality to disable the quirk in QEMU[3]. So lets just do it. > > > > > > > > A new property 'hypercall-patching=on|off' is added, for the very > > > > unlikely case that there are setups that really need the patching. > > > > However, these would be vulnerable to memory corruption attacks freely > > > > overwriting code as they please. So, my guess is, there are exactly 0 > > > > systems out there requiring this quirk. > > > > > > The default behavior is patching the hypercall for many years. > > > > > > If you desire to change the default behavior, please at least keep it > > > unchanged for old machine version. i.e., introduce compat_property, > > > which sets KVMState->hypercall_patching_enabled to true. > > > > Well, the thing is, KVM's patching is done with the effective > > permissions of the guest which means, if the code in question isn't > > writable from the guest's point of view, KVM's attempt to modify it will > > fail. This failure isn't transparent for the guest as it sees a #PF > > instead of a #UD, and that's what I'm trying to fix by disabling the quirk. > > > > The hypercall patching was introduced in Linux commit 7aa81cc04781 > > ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then > > it was based on a dedicated hypercall page that was handled by KVM to > > use the proper instruction of the KVM module in use (VMX or SVM). > > > > Patching code was fine back then, but the introduction of DEBUG_RO_DATA > > made the patching attempts fail and, ultimately, lead to Paolo handle > > this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL > > vs. VMMCALL if kernel text is read-only"). > > > > However, his change still doesn't account for the cross-vendor live > > migration case (Intel<->AMD), which will still be broken, causing the > > before mentioned bogus #PF, which will just lead to misleading Oops > > reports, confusing the poor souls, trying to make sense of it. > > > > IMHO, there is no valid reason for still having the patching in place as > > the .text of non-ancient kernel's will be write-protected, making > > patching attempts fail. And, as they fail with a #PF instead of #UD, the > > guest cannot even handle them appropriately, as there was no memory > > write attempt from its point of view. Therefore the default should be to > > disable it, IMO. This won't prevent guests making use of the wrong > > instruction from trapping, but, at least, now they'll get the correct > > exception vector and can handle it appropriately. > > But you don't accout for the case that guest kernel is built without > CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for > whatever reason the guest's text is not readonly, and the VM needs to be > migrated among different vendors (Intel <-> AMD). > > Before this patch, the above usecase works well. But with this patch, the > guest will gets #UD after migrated to different vendors. > > I heard from some small CSPs that they do want to the ability to live > migrate VMs among Intel and AMD host. Usually CSPs don't have full control over what their customers are running as a guest. If their customers are running mainstream modern guest OS, CONFIG_STRICT_KERNEL_RWX is pretty likely to be set, so presumably migration between Intel & AMD will not work and this isn't making it worse ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 11:15 ` Daniel P. Berrangé @ 2025-07-22 12:21 ` Xiaoyao Li 2025-07-22 20:13 ` Mathias Krause 0 siblings, 1 reply; 14+ messages in thread From: Xiaoyao Li @ 2025-07-22 12:21 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Mathias Krause, qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 7/22/2025 7:15 PM, Daniel P. Berrangé wrote: > On Tue, Jul 22, 2025 at 06:27:45PM +0800, Xiaoyao Li wrote: >> On 7/22/2025 5:21 PM, Mathias Krause wrote: >>> On 22.07.25 05:45, Xiaoyao Li wrote: >>>> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>>>> exception (#UD) as they are just the wrong instruction for the CPU >>>>> given. But instead of forwarding the exception to the guest, KVM tries >>>>> to patch the guest instruction to match the host's actual hypercall >>>>> instruction. That is doomed to fail as read-only code is rather the >>>>> standard these days. But, instead of letting go the patching attempt and >>>>> falling back to #UD injection, KVM injects the page fault instead. >>>>> >>>>> That's wrong on multiple levels. Not only isn't that a valid exception >>>>> to be generated by these instructions, confusing attempts to handle >>>>> them. It also destroys guest state by doing so, namely the value of CR2. >>>>> >>>>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>>>> >>>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>>>> conceptually be disabled. Paolo even called out to add this very >>>>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>>>> >>>>> A new property 'hypercall-patching=on|off' is added, for the very >>>>> unlikely case that there are setups that really need the patching. >>>>> However, these would be vulnerable to memory corruption attacks freely >>>>> overwriting code as they please. So, my guess is, there are exactly 0 >>>>> systems out there requiring this quirk. >>>> >>>> The default behavior is patching the hypercall for many years. >>>> >>>> If you desire to change the default behavior, please at least keep it >>>> unchanged for old machine version. i.e., introduce compat_property, >>>> which sets KVMState->hypercall_patching_enabled to true. >>> >>> Well, the thing is, KVM's patching is done with the effective >>> permissions of the guest which means, if the code in question isn't >>> writable from the guest's point of view, KVM's attempt to modify it will >>> fail. This failure isn't transparent for the guest as it sees a #PF >>> instead of a #UD, and that's what I'm trying to fix by disabling the quirk. >>> >>> The hypercall patching was introduced in Linux commit 7aa81cc04781 >>> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then >>> it was based on a dedicated hypercall page that was handled by KVM to >>> use the proper instruction of the KVM module in use (VMX or SVM). >>> >>> Patching code was fine back then, but the introduction of DEBUG_RO_DATA >>> made the patching attempts fail and, ultimately, lead to Paolo handle >>> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL >>> vs. VMMCALL if kernel text is read-only"). >>> >>> However, his change still doesn't account for the cross-vendor live >>> migration case (Intel<->AMD), which will still be broken, causing the >>> before mentioned bogus #PF, which will just lead to misleading Oops >>> reports, confusing the poor souls, trying to make sense of it. >>> >>> IMHO, there is no valid reason for still having the patching in place as >>> the .text of non-ancient kernel's will be write-protected, making >>> patching attempts fail. And, as they fail with a #PF instead of #UD, the >>> guest cannot even handle them appropriately, as there was no memory >>> write attempt from its point of view. Therefore the default should be to >>> disable it, IMO. This won't prevent guests making use of the wrong >>> instruction from trapping, but, at least, now they'll get the correct >>> exception vector and can handle it appropriately. >> >> But you don't accout for the case that guest kernel is built without >> CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for >> whatever reason the guest's text is not readonly, and the VM needs to be >> migrated among different vendors (Intel <-> AMD). >> >> Before this patch, the above usecase works well. But with this patch, the >> guest will gets #UD after migrated to different vendors. >> >> I heard from some small CSPs that they do want to the ability to live >> migrate VMs among Intel and AMD host. > > Usually CSPs don't have full control over what their customers > are running as a guest. If their customers are running mainstream > modern guest OS, CONFIG_STRICT_KERNEL_RWX is pretty likely to be > set, so presumably migration between Intel & AMD will not work > and this isn't making it worse ? If breaking some usecase is not a concern, then I'm fine with no compat property. > With regards, > Daniel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 12:21 ` Xiaoyao Li @ 2025-07-22 20:13 ` Mathias Krause 0 siblings, 0 replies; 14+ messages in thread From: Mathias Krause @ 2025-07-22 20:13 UTC (permalink / raw) To: Xiaoyao Li, Daniel P. Berrangé Cc: qemu-devel, Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 22.07.25 14:21, Xiaoyao Li wrote: > On 7/22/2025 7:15 PM, Daniel P. Berrangé wrote: >> [...] >> >> Usually CSPs don't have full control over what their customers >> are running as a guest. If their customers are running mainstream >> modern guest OS, CONFIG_STRICT_KERNEL_RWX is pretty likely to be >> set, so presumably migration between Intel & AMD will not work >> and this isn't making it worse ? > > If breaking some usecase is not a concern, then I'm fine with no compat > property. Well, there's still the chicken bit `-accel kvm,hypercall-patching=on` one could make use of if (and really only if) that's really needed. But I'd rather have the guest see a proper exception emulation of trying to execute an unsupported instruction than seeing a bogus #PF. Thanks, Mathias ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-22 10:27 ` Xiaoyao Li 2025-07-22 10:35 ` Mohamed Mediouni 2025-07-22 11:15 ` Daniel P. Berrangé @ 2025-07-22 20:08 ` Mathias Krause 2 siblings, 0 replies; 14+ messages in thread From: Mathias Krause @ 2025-07-22 20:08 UTC (permalink / raw) To: Xiaoyao Li, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 22.07.25 12:27, Xiaoyao Li wrote: > On 7/22/2025 5:21 PM, Mathias Krause wrote: >> IMHO, there is no valid reason for still having the patching in place as >> the .text of non-ancient kernel's will be write-protected, making >> patching attempts fail. And, as they fail with a #PF instead of #UD, the >> guest cannot even handle them appropriately, as there was no memory >> write attempt from its point of view. Therefore the default should be to >> disable it, IMO. This won't prevent guests making use of the wrong >> instruction from trapping, but, at least, now they'll get the correct >> exception vector and can handle it appropriately. > > But you don't accout for the case that guest kernel is built without > CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or > for whatever reason the guest's text is not readonly, and the VM needs > to be migrated among different vendors (Intel <-> AMD). > > Before this patch, the above usecase works well. But with this patch, > the guest will gets #UD after migrated to different vendors. I strongly doubt that use case works well, probably doesn't work at all right now. There's so much more to handle on the kernel side for such a thing to actually "work". Think of mitigations like RETPOLINE, RETHUNK, UNRET, SRSO, GDS,... and all the vendor-specific MSRs that need to be switched over. No way this "just works". Not patching VMCALL to VMMCALL or vice versa is the least of your problems in such a scenario. > I heard from some small CSPs that they do want to the ability to live > migrate VMs among Intel and AMD host. Well, they'll have a hard time trying to get it to work, sorry. >> [...] >> >> Ignoring pre-v2.6.25 kernels for a moment, we can assume that KVM will >> do the patching. So the lack of KVM_X86_QUIRK_FIX_HYPERCALL_INSN but >> having 'hypercall_patching_enabled == false' indicates that the user >> wants to disable it but QEMU cannot do so, because KVM lacks the >> extension to do so. This, IMO, legitimizes the warn_report("kvm: >> disabling hypercall patching not supported") -- as it's not supported. > > The minimum supported kernel version is 4.5 (see commit f180e367fce4). > So pre-v2.6.25 kernels is not the case. Perfect! > Surely we can list all the cases of different versions of KVM starting > from v4.5 and draw the conclusion that the semantics of "valid_quirks & > KVM_X86_QUIRK_FIX_HYPERCALL_INSN == 0" means KVM enables the hypercall > patching quirk but don't provide the interface for userspace to disable > it. So the code logic is correct. Thanks for confirming! > My statement of "I think it requires a new cap in KVM to return the > enabled quirks" is more for generic consideration. i.e., QEMU can know > whether a quirk is enabled or not without analysing the detailed history > of KVM. Well, all quirks are enabled by default, only explicitly disabled ones are, well, disabled. And that requires actively doing so via kvm_vm_enable_cap(KVM_CAP_DISABLE_QUIRKS[2]). Surely, QEMU can track these calls on its own if it really wants to. > Of course, it's more of the requirement on KVM to provide new interface > and Current QEMU can do nothing on it. That is, current implementation > of this PATCH is OK to me. Ok, I'll send a v2 incorporating the member name change to 'hypercall_patching'. >>> [...] >>> I think return 0 here is to avoid the warn_report() in the caller. But >>> for the correct semantics, we need to return -1 to indicate that it >>> fails to disable the hypercall patching? >> >> No, returning 0 here is very much on purpose, as you noticed, to avoid >> the warn_report() in the caller. The already issued warn_report() is the >> correct one for this case. > > We can use @Error to pass the error log instead of the trick on return > value. > > e.g., > > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -3228,17 +3228,24 @@ static int kvm_vm_enable_energy_msrs(KVMState *s) > return 0; > } > > -static int kvm_vm_disable_hypercall_patching(KVMState *s) > +static int kvm_vm_disable_hypercall_patching(KVMState *s, Error **errp) > { > int valid_quirks = kvm_vm_check_extension(s, KVM_CAP_DISABLE_QUIRKS2); > + int ret = -1; > > if (valid_quirks & KVM_X86_QUIRK_FIX_HYPERCALL_INSN) { > - return kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > - KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + ret = kvm_vm_enable_cap(s, KVM_CAP_DISABLE_QUIRKS2, 0, > + KVM_X86_QUIRK_FIX_HYPERCALL_INSN); > + if (ret) { > + error_setg_errno(errp, -ret, "kvm: failed to disable " > + "hypercall patching quirk: %s", > + strerror(-ret)); > + } > + } else { > + error_setg(errp, "kvm: disabling hypercall patching not > supported"); > } > > - warn_report("kvm: disabling hypercall patching not supported"); > - return 0; > + return ret; > } > > int kvm_arch_init(MachineState *ms, KVMState *s) > @@ -3381,8 +3388,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > > if (s->hypercall_patching_enabled == false) { > - if (kvm_vm_disable_hypercall_patching(s)) { > - warn_report("kvm: failed to disable hypercall patching > quirk"); > + if (kvm_vm_disable_hypercall_patching(s, &local_err)) { > + error_report_err(local_err); > } > } > Ohh, neat. I'll integrate that as well. Thanks! Thanks, Mathias ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-06-19 19:42 [PATCH] i386/kvm: Disable hypercall patching quirk by default Mathias Krause 2025-07-21 10:22 ` Mathias Krause 2025-07-22 3:45 ` Xiaoyao Li @ 2025-07-23 9:26 ` David Woodhouse 2025-08-04 9:35 ` Mathias Krause 2 siblings, 1 reply; 14+ messages in thread From: David Woodhouse @ 2025-07-23 9:26 UTC (permalink / raw) To: Mathias Krause, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson [-- Attachment #1: Type: text/plain, Size: 2089 bytes --] On Thu, 2025-06-19 at 21:42 +0200, Mathias Krause wrote: > KVM has a weird behaviour when a guest executes VMCALL on an AMD system > or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode > exception (#UD) as they are just the wrong instruction for the CPU > given. But instead of forwarding the exception to the guest, KVM tries > to patch the guest instruction to match the host's actual hypercall > instruction. That is doomed to fail as read-only code is rather the > standard these days. But, instead of letting go the patching attempt and > falling back to #UD injection, KVM injects the page fault instead. > > That's wrong on multiple levels. Not only isn't that a valid exception > to be generated by these instructions, confusing attempts to handle > them. It also destroys guest state by doing so, namely the value of CR2. > > Sean attempted to fix that in KVM[1] but the patch was never applied. > > Later, Oliver added a quirk bit in [2] so the behaviour can, at least, > conceptually be disabled. Paolo even called out to add this very > functionality to disable the quirk in QEMU[3]. So lets just do it. > > A new property 'hypercall-patching=on|off' is added, for the very > unlikely case that there are setups that really need the patching. > However, these would be vulnerable to memory corruption attacks freely > overwriting code as they please. So, my guess is, there are exactly 0 > systems out there requiring this quirk. I am always wary of making assumptions about how guests behave in the general case. Every time we do so, we seem to find that *some* ancient version of some random network applicance — or FreeBSD — does exactly the thing we considered unlikely. And customers get sad. As a general rule, before disabling a thing that even *might* have worked for a guest, I'd like to run in a 'warning' mode first. Only after running the whole fleet with such a warning and observing that it *doesn't* trigger, can we actually switch the thing *off*. Can we have 'hypercall-patching=on|off|log' ? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5069 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] i386/kvm: Disable hypercall patching quirk by default 2025-07-23 9:26 ` David Woodhouse @ 2025-08-04 9:35 ` Mathias Krause 0 siblings, 0 replies; 14+ messages in thread From: Mathias Krause @ 2025-08-04 9:35 UTC (permalink / raw) To: David Woodhouse, qemu-devel Cc: Paolo Bonzini, Marcelo Tosatti, kvm, Oliver Upton, Sean Christopherson On 23.07.25 11:26, David Woodhouse wrote: > On Thu, 2025-06-19 at 21:42 +0200, Mathias Krause wrote: >> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >> exception (#UD) as they are just the wrong instruction for the CPU >> given. But instead of forwarding the exception to the guest, KVM tries >> to patch the guest instruction to match the host's actual hypercall >> instruction. That is doomed to fail as read-only code is rather the >> standard these days. But, instead of letting go the patching attempt and >> falling back to #UD injection, KVM injects the page fault instead. >> >> That's wrong on multiple levels. Not only isn't that a valid exception >> to be generated by these instructions, confusing attempts to handle >> them. It also destroys guest state by doing so, namely the value of CR2. >> >> Sean attempted to fix that in KVM[1] but the patch was never applied. >> >> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >> conceptually be disabled. Paolo even called out to add this very >> functionality to disable the quirk in QEMU[3]. So lets just do it. >> >> A new property 'hypercall-patching=on|off' is added, for the very >> unlikely case that there are setups that really need the patching. >> However, these would be vulnerable to memory corruption attacks freely >> overwriting code as they please. So, my guess is, there are exactly 0 >> systems out there requiring this quirk. > > I am always wary of making assumptions about how guests behave in the > general case. Every time we do so, we seem to find that *some* ancient > version of some random network applicance — or FreeBSD — does exactly > the thing we considered unlikely. And customers get sad. > > As a general rule, before disabling a thing that even *might* have > worked for a guest, I'd like to run in a 'warning' mode first. Only > after running the whole fleet with such a warning and observing that it > *doesn't* trigger, can we actually switch the thing *off*. Looks like I was overly optimistic. There are, of course, use cases that rely on the hypercall patching, even if it's just for testing purposes. One of these are the KUT tests. I tried to fix these[1], however, there are probably more such mini-kernels, so I reverted back to not changing the default behaviour and only provided a knob to disabled the quirk, making users to manually opt-in to it[2]. > > Can we have 'hypercall-patching=on|off|log' ? I'd like to have the 'log' option as well. But as KVM does the patching on its own, this would require QEMU to analyze and react to related #UD exceptions (and possibly #PF to handle currently failing uses cases with read-only code too) further, I'd rather not want to do. Another option would be to do a WARN_ON[_ONCE]() in KVM if it does the patching. But, then, existing use cases would suddenly trigger a kernel warning, which used to work before. Again, something users probably don't want to see. :/ I guess, we have to stick around with the default but make users aware of the option to disable the patching themselves. Thanks, Mathias [1] https://lore.kernel.org/kvm/20250724191050.1988675-1-minipli@grsecurity.net/ [2] https://lore.kernel.org/kvm/20250801131226.2729893-1-minipli@grsecurity.net/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-08-04 9:35 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-19 19:42 [PATCH] i386/kvm: Disable hypercall patching quirk by default Mathias Krause 2025-07-21 10:22 ` Mathias Krause 2025-07-22 3:45 ` Xiaoyao Li 2025-07-22 9:21 ` Mathias Krause 2025-07-22 10:27 ` Xiaoyao Li 2025-07-22 10:35 ` Mohamed Mediouni 2025-07-22 11:06 ` Xiaoyao Li 2025-07-22 11:16 ` Mohamed Mediouni 2025-07-22 11:15 ` Daniel P. Berrangé 2025-07-22 12:21 ` Xiaoyao Li 2025-07-22 20:13 ` Mathias Krause 2025-07-22 20:08 ` Mathias Krause 2025-07-23 9:26 ` David Woodhouse 2025-08-04 9:35 ` Mathias Krause
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).