kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with vmrun in an interrupt shadow
@ 2025-02-07 12:44 Doug Covelli
  2025-02-07 23:54 ` Sean Christopherson
  0 siblings, 1 reply; 2+ messages in thread
From: Doug Covelli @ 2025-02-07 12:44 UTC (permalink / raw)
  To: kvm

To test support for nested virtualization I was running a VM (L2) on a
debug build of ESX (L1) on VMware Workstation/KVM (L0).  This
consistently resulted in an ASSERT in L1 firing as the interrupt
shadow bit in the VMCB was set on an #NPF exit that occurred when
vectoring through the IDT to deliver an interrupt to L2.

Some details from our exit recorder are below.  Basically what
happened is that L1 resumed L2 after handling an I/O exit and
attempted to inject an internal interrupt with vector 0x68.  This
resulted in a #NPF exit when vectoring through the IDT to deliver the
interrupt to the guest with the interrupt shadow bit set which our
code is not expecting.  There is no reason for the interrupt shadow
bit to be set and neither L1 or L0 were setting it.

This turns out to be due to a quirk where on AMD 'vmrun' after an
'sti' will cause the interrupt shadow bit to leak into the guest state
in the VMCB. Jim Mattson discovered this back when he was with VMware
and checked in a fix to make sure that our 'vmrun' is not immediately
after an 'sti':

        sti             /* Enable interrupts during guest execution */
        mov             svmPhysCurrentVMCB(%rip), %rax
        vmrun           /* Must not immediately follow STI. See PR 150935 */

PR 150935 describes exactly the same problem I am seeing with KVM.
For KVM the 'vmrun' is immediately after a 'sti' though:

        /* Enter guest mode */
        sti

1:      vmrun %rax

I confirmed that moving the 'sti' after the mov instruction in the
VMware code causes the same exact ASSERT to fire.  I discussed this
with Jim and Sean and they suggested sending an e-mail to this list.
Jim also mentioned that this was introduced by [1] a few years back.
It would be hard to argue that this isn't an AMD bug but it seems best
to workaround it in SW.  It would be great if someone could fix this
but if folks are too busy I can ask Zach to include it in the patches
he is working on.

Thanks.
Doug

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm/svm/vmenter.S?id=fb0c4a4fee5a35b4e531b57e42231868d1fedb18

(gdb) p svmDbgExitRecs[5]->rip
$9 = 0x5b5f723
(gdb) p svmDbgExitRecs[5]->intrState
$10 = 0x0
(gdb) p svmDbgExitRecs[5]->exitCode
$11 = 0x7b
(gdb) p svmDbgExitRecs[5]->exitIntInfo
$12 = 0x0

(gdb) p svmDbgResumeRecs[5]->rip
$13 = 0x5b5f724
(gdb) p svmDbgResumeRecs[5]->intrState
$14 = 0x0
(gdb) p svmDbgResumeRecs[5]->exitCode
$15 = 0x7b
(gdb) p svmDbgResumeRecs[5]->exitIntInfo
$16 = 0x0
(gdb) p svmDbgResumeRecs[5]->eventInj
$22 = 0x80000068

(gdb) p svmDbgExitRecs[6]->rip
$17 = 0x5b5f724
(gdb) p svmDbgExitRecs[6]->intrState
$18 = 0x1  <<< should be 0
(gdb) p svmDbgExitRecs[6]->exitCode
$19 = 0x400
(gdb) p svmDbgExitRecs[6]->exitIntInfo
$20 = 0x80000068

   0x5b5f71f:   mov    %edx,%eax
   0x5b5f721:   mov    %ecx,%edx
   0x5b5f723:   out    %al,(%dx)
=> 0x5b5f724:   retq

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Problem with vmrun in an interrupt shadow
  2025-02-07 12:44 Problem with vmrun in an interrupt shadow Doug Covelli
@ 2025-02-07 23:54 ` Sean Christopherson
  0 siblings, 0 replies; 2+ messages in thread
From: Sean Christopherson @ 2025-02-07 23:54 UTC (permalink / raw)
  To: Doug Covelli; +Cc: kvm

On Fri, Feb 07, 2025, Doug Covelli wrote:
> To test support for nested virtualization I was running a VM (L2) on a
> debug build of ESX (L1) on VMware Workstation/KVM (L0).  This
> consistently resulted in an ASSERT in L1 firing as the interrupt
> shadow bit in the VMCB was set on an #NPF exit that occurred when
> vectoring through the IDT to deliver an interrupt to L2.
> 
> Some details from our exit recorder are below.  Basically what
> happened is that L1 resumed L2 after handling an I/O exit and
> attempted to inject an internal interrupt with vector 0x68.  This
> resulted in a #NPF exit when vectoring through the IDT to deliver the
> interrupt to the guest with the interrupt shadow bit set which our
> code is not expecting.  There is no reason for the interrupt shadow
> bit to be set and neither L1 or L0 were setting it.
> 
> This turns out to be due to a quirk where on AMD 'vmrun' after an
> 'sti' will cause the interrupt shadow bit to leak into the guest state
> in the VMCB. Jim Mattson discovered this back when he was with VMware
> and checked in a fix to make sure that our 'vmrun' is not immediately
> after an 'sti':
> 
>         sti             /* Enable interrupts during guest execution */
>         mov             svmPhysCurrentVMCB(%rip), %rax
>         vmrun           /* Must not immediately follow STI. See PR 150935 */
> 
> PR 150935 describes exactly the same problem I am seeing with KVM.
> For KVM the 'vmrun' is immediately after a 'sti' though:
> 
>         /* Enter guest mode */
>         sti
> 
> 1:      vmrun %rax
> 
> I confirmed that moving the 'sti' after the mov instruction in the
> VMware code causes the same exact ASSERT to fire.  I discussed this
> with Jim and Sean and they suggested sending an e-mail to this list.
> Jim also mentioned that this was introduced by [1] a few years back.
> It would be hard to argue that this isn't an AMD bug but it seems best
> to workaround it in SW.  It would be great if someone could fix this
> but if folks are too busy I can ask Zach to include it in the patches
> he is working on.

I'll post a patch and a regression test.  It took me ~15 minutes to realize the
key is taking an exit while injecting an event, i.e. before executing anything
in the guest.  ~3 minutes to re-learn nested_exceptions_test.c, and 2 seconds
to add a testcase:

diff --git a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
index 3eb0313ffa39..3641a42934ac 100644
--- a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
@@ -85,6 +85,7 @@ static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vector,
 
        GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
        GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
+       GUEST_ASSERT(!ctrl->int_state);
 }
 
 static void l1_svm_code(struct svm_test_data *svm)
@@ -122,6 +123,7 @@ static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
        GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
        GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
        GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
+       GUEST_ASSERT(!vmreadz(GUEST_INTERRUPTIBILITY_INFO));
 }
 
 static void l1_vmx_code(struct vmx_pages *vmx)


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-02-07 23:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 12:44 Problem with vmrun in an interrupt shadow Doug Covelli
2025-02-07 23:54 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).