* RFC: few questions about hypercall patching in KVM
@ 2022-12-14 17:57 Maxim Levitsky
2022-12-15 0:53 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: Maxim Levitsky @ 2022-12-14 17:57 UTC (permalink / raw)
To: kvm@vger.kernel.org
Hi!
Recently I had to debug a case of KVM's hypercall patching failing in a special case of running qemu under valgrind.
In nutshell what is happening is that qemu uses 'cpuid' instruction to gather some
info about the host and some of it
is passed to the guest cpuid, and that includes the vendor string.
Under valgrind it emulates the CPU (aka TCG), so qemu sees virtual cpu, with virtual cpuid which
has hardcoded vendor string
the 'GenuineIntel', so when your run qemu with KVM on AMD host, the guest will see Intel's vendor string regardless of other
'-cpu' settings (even -cpu host)
This ensures
that the guest uses the wrong hypercall instruction (vmcall instead of vmmcall), and sometimes it will use it after the guest kernel write protects its memory.
This will lead to a failure of the
hypercall patching as the kvm writes to the guest memory
as if the instruction wrote to it, and this checks the permissions in the guest paging.
So the VMCALL instruction gets totally unexpected #PF.
1. Now I suggest that when hypercall patching fails, can we do kvm_vm_bugged() instead of forwarding the hypercall?
I know that vmmcall can be executed from ring 3 as well, so I can limit this to hypercall patching that happens when guest
ring is 0.
2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case instead of patching? Any technical reasons for not doing this?
Few guests use it so the perf impact should be very small.
I can send a patch for either of two options if you agree with me.
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: few questions about hypercall patching in KVM
2022-12-14 17:57 RFC: few questions about hypercall patching in KVM Maxim Levitsky
@ 2022-12-15 0:53 ` Sean Christopherson
2022-12-15 10:29 ` Maxim Levitsky
0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2022-12-15 0:53 UTC (permalink / raw)
To: Maxim Levitsky; +Cc: kvm@vger.kernel.org
On Wed, Dec 14, 2022, Maxim Levitsky wrote:
> Hi!
>
>
> Recently I had to debug a case of KVM's hypercall patching failing in a
> special case of running qemu under valgrind.
>
> In nutshell what is happening is that qemu uses 'cpuid' instruction to gather
> some info about the host and some of it is passed to the guest cpuid, and
> that includes the vendor string.
>
> Under valgrind it emulates the CPU (aka TCG), so qemu sees virtual cpu, with
> virtual cpuid which has hardcoded vendor string the 'GenuineIntel', so when
> your run qemu with KVM on AMD host, the guest will see Intel's vendor string
> regardless of other '-cpu' settings (even -cpu host)
>
> This ensures that the guest uses the wrong hypercall instruction (vmcall
> instead of vmmcall), and sometimes it will use it after the guest kernel
> write protects its memory. This will lead to a failure of the hypercall
> patching as the kvm writes to the guest memory as if the instruction wrote to
> it, and this checks the permissions in the guest paging.
>
> So the VMCALL instruction gets totally unexpected #PF.
Yep, been there, done that :-)
> 1. Now I suggest that when hypercall patching fails, can we do
> kvm_vm_bugged() instead of forwarding the hypercall? I know that vmmcall can
> be executed from ring 3 as well, so I can limit this to hypercall patching
> that happens when guest ring is 0.
And L1. But why? It's not a KVM bug per se, it's a known deficiency in KVM's
emulator. What to do in response to the failure should be up to userspace. The
real "fix" is to disable the quirk in QEMU.
> 2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case
> instead of patching? Any technical reasons for not doing this? Few guests
> use it so the perf impact should be very small.
Nested is basically impossible to get right[1][2]. IIRC, calling into
kvm_emulate_hypercall() from the emulator also gets messy (I think I tried doing
exactly this at some point).
[1] https://lore.kernel.org/all/Yjyt7tKSDhW66fnR@google.com
[2] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: few questions about hypercall patching in KVM
2022-12-15 0:53 ` Sean Christopherson
@ 2022-12-15 10:29 ` Maxim Levitsky
2022-12-19 18:27 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: Maxim Levitsky @ 2022-12-15 10:29 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm@vger.kernel.org
On Thu, 2022-12-15 at 00:53 +0000, Sean Christopherson wrote:
> On Wed, Dec 14, 2022, Maxim Levitsky wrote:
> > Hi!
> >
> >
> > Recently I had to debug a case of KVM's hypercall patching failing in a
> > special case of running qemu under valgrind.
> >
> > In nutshell what is happening is that qemu uses 'cpuid' instruction to gather
> > some info about the host and some of it is passed to the guest cpuid, and
> > that includes the vendor string.
> >
> > Under valgrind it emulates the CPU (aka TCG), so qemu sees virtual cpu, with
> > virtual cpuid which has hardcoded vendor string the 'GenuineIntel', so when
> > your run qemu with KVM on AMD host, the guest will see Intel's vendor string
> > regardless of other '-cpu' settings (even -cpu host)
> >
> > This ensures that the guest uses the wrong hypercall instruction (vmcall
> > instead of vmmcall), and sometimes it will use it after the guest kernel
> > write protects its memory. This will lead to a failure of the hypercall
> > patching as the kvm writes to the guest memory as if the instruction wrote to
> > it, and this checks the permissions in the guest paging.
> >
> > So the VMCALL instruction gets totally unexpected #PF.
>
> Yep, been there, done that :-)
>
> > 1. Now I suggest that when hypercall patching fails, can we do
> > kvm_vm_bugged() instead of forwarding the hypercall? I know that vmmcall can
> > be executed from ring 3 as well, so I can limit this to hypercall patching
> > that happens when guest ring is 0.
>
> And L1. But why? It's not a KVM bug per se, it's a known deficiency in KVM's
> emulator. What to do in response to the failure should be up to userspace. The
> real "fix" is to disable the quirk in QEMU.
Yes, and L1, you are right - I thought about nested case, that maybe it is possible
to eliminate it, but you are right, it can't be eliminated.
My reasoning for doing kvm_vm_bugged() (or returning X86EMUL_UNHANDLEABLE even better maybe,
to give userspace a theoretical chance of dealing with it)
is to make the error at least a bit more visible.
(I for example thought for a while that there is some memory corrupion in the guest caused by valgrind,
which cause that #PF)
>
> > 2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case
> > instead of patching? Any technical reasons for not doing this? Few guests
> > use it so the perf impact should be very small.
>
> Nested is basically impossible to get right[1][2]. IIRC, calling into
> kvm_emulate_hypercall() from the emulator also gets messy (I think I tried doing
> exactly this at some point).
It could very well be, however if L0's KVM starts to emulate both VMMCALL and VMCALL
instructions (when the quirk is enabled) then it will be the closest to what KVM always did,
and it will not overwrite the guest memory.
About calling into kvm_emulate_hypercall I can expect trouble, but I would be very happy
if you recall which problems did you face.
Note that at least for a nested guest, we can avoid patching right away because both VMMCALL and VMCALL
that are done in nested guest will never need to call kvm_emulate_hypercall().
VMCALL is always intercepted by L1 as defined by VMX spec, while VMMCALL if not intercepted causes #UD
in the guest.
In those cases emulation is very simple.
As for L1, we already have a precedent: #GP is sometimes
emulated as SVM instruction due to the AMD's errata.
Look at gp_interception:
You first decode the instruciton, and if it is VMCALL, then call the kvm_emulate_hypercall()
This way there is no recursive emulator call.
What do you think?
Best regards,
Maxim Levitsky
>
> [1] https://lore.kernel.org/all/Yjyt7tKSDhW66fnR@google.com
> [2] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: few questions about hypercall patching in KVM
2022-12-15 10:29 ` Maxim Levitsky
@ 2022-12-19 18:27 ` Sean Christopherson
0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-12-19 18:27 UTC (permalink / raw)
To: Maxim Levitsky; +Cc: kvm@vger.kernel.org
On Thu, Dec 15, 2022, Maxim Levitsky wrote:
> On Thu, 2022-12-15 at 00:53 +0000, Sean Christopherson wrote:
> > On Wed, Dec 14, 2022, Maxim Levitsky wrote:
> > > 1. Now I suggest that when hypercall patching fails, can we do
> > > kvm_vm_bugged() instead of forwarding the hypercall? I know that vmmcall can
> > > be executed from ring 3 as well, so I can limit this to hypercall patching
> > > that happens when guest ring is 0.
> >
> > And L1. But why? It's not a KVM bug per se, it's a known deficiency in KVM's
> > emulator. What to do in response to the failure should be up to userspace. The
> > real "fix" is to disable the quirk in QEMU.
>
> Yes, and L1, you are right - I thought about nested case, that maybe it is possible
> to eliminate it, but you are right, it can't be eliminated.
>
> My reasoning for doing kvm_vm_bugged() (or returning X86EMUL_UNHANDLEABLE
> even better maybe, to give userspace a theoretical chance of dealing with it)
>
> is to make the error at least a bit more visible. (I for example thought for
> a while that there is some memory corrupion in the guest caused by valgrind,
> which cause that #PF)
Yeah, the #PF is nasty, but bugging the VM isn't much better, and based on past
analysis, gracefully getting out to userspace isn't trivial.
https://lore.kernel.org/all/YUNqEeWg32kNwfO8@google.com
> > > 2. Why can't we just emulate the VMCALL/VMMCALL instruction in this case
> > > instead of patching? Any technical reasons for not doing this? Few guests
> > > use it so the perf impact should be very small.
> >
> > Nested is basically impossible to get right[1][2]. IIRC, calling into
> > kvm_emulate_hypercall() from the emulator also gets messy (I think I tried doing
> > exactly this at some point).
>
> It could very well be, however if L0's KVM starts to emulate both VMMCALL and
> VMCALL instructions (when the quirk is enabled) then it will be the closest
> to what KVM always did, and it will not overwrite the guest memory.
>
> About calling into kvm_emulate_hypercall I can expect trouble, but I would be
> very happy if you recall which problems did you face.
The above link has more details than I can recall.
> Note that at least for a nested guest, we can avoid patching right away
> because both VMMCALL and VMCALL that are done in nested guest will never need
> to call kvm_emulate_hypercall().
>
> VMCALL is always intercepted by L1 as defined by VMX spec, while VMMCALL if
> not intercepted causes #UD in the guest.
>
> In those cases emulation is very simple.
>
> As for L1, we already have a precedent: #GP is sometimes emulated as SVM
> instruction due to the AMD's errata.
>
>
> Look at gp_interception:
>
> You first decode the instruciton, and if it is VMCALL, then call the
> kvm_emulate_hypercall() This way there is no recursive emulator call.
>
> What do you think?
I don't love the idea of expanding out-of-emulator emulation, especially since
the behavior is quirky, i.e. KVM shouldn't emulate the wrong hypercall instruction
if userspace has disabled KVM_X86_QUIRK_FIX_HYPERCALL_INSN.
My vote is to have QEMU disable the quirk, and if necessary, "fix" QEMU's TCG to
enumerate the correct vendor.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-12-19 18:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-14 17:57 RFC: few questions about hypercall patching in KVM Maxim Levitsky
2022-12-15 0:53 ` Sean Christopherson
2022-12-15 10:29 ` Maxim Levitsky
2022-12-19 18:27 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox