KVM_EXIT_FAIL_ENTRY with hardware_entry_failure

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
@ 2023-07-26 16:35 Yahya Sohail
  2023-07-26 17:17 ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yahya Sohail @ 2023-07-26 16:35 UTC (permalink / raw)
  To: kvm

Hi,

I'm trying to copy the state of an x86 emulator into a KVM VM.

I've loaded the relevant state (i.e. registers and memory) into a KVM VM 
and VCPU, and tried to do a KVM_RUN on the VCPU, but it fails with 
KVM_EXIT_FAIL_ENTRY and hardware_entry_failure_reason = 7. I looked 
through the KVM source and Intel manuals to determine that this either 
means that the CPU is in an interrupt window and the VM was setup to 
exit on an interrupt window, or that a VM entry occurred with invalid 
control fields. The former is not possible because my RFLAGS.IF = 0, 
meaning interrupts are currently disabled, so I think it's the latter.

Is it possible for someone using the KVM API to set the VMCS to an 
invalid state? If so, what fields in the kvm_run struct should I check 
that could cause such an issue?

Thanks,
Yahya Sohail

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-26 16:35 KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7 Yahya Sohail
@ 2023-07-26 17:17 ` Sean Christopherson
  2023-07-26 19:16   ` Yahya Sohail
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2023-07-26 17:17 UTC (permalink / raw)
  To: Yahya Sohail; +Cc: kvm

On Wed, Jul 26, 2023, Yahya Sohail wrote:
> Hi,
> 
> I'm trying to copy the state of an x86 emulator into a KVM VM.
> 
> I've loaded the relevant state (i.e. registers and memory) into a KVM VM and
> VCPU, and tried to do a KVM_RUN on the VCPU, but it fails with
> KVM_EXIT_FAIL_ENTRY and hardware_entry_failure_reason = 7. I looked through
> the KVM source and Intel manuals to determine that this either means that
> the CPU is in an interrupt window and the VM was setup to exit on an
> interrupt window, or that a VM entry occurred with invalid control fields.
> The former is not possible because my RFLAGS.IF = 0, meaning interrupts are
> currently disabled, so I think it's the latter.

No, there are far, far more possible problems.  Error code 7 is "invalid control
field", which is a gigantic bin for any failed consistency check that is related
to one or more VMCS control fields.

> Is it possible for someone using the KVM API to set the VMCS to an invalid
> state?

Yes.  Ideally it _shouldn't_ be possible[*], but practically speaking I don't think
there's ever been a version of KVM that prevents userspace from coercing KVM into
loading invalid state.  E.g. see https://lore.kernel.org/all/20230613203037.1968489-1-seanjc@google.com

[*] For VMCS control fields specifically.  Preventing userspace from loading
    invalid guest state is extremely difficult, and not something I realistically
    expect KVM to get 100% right anytime soon.

> If so, what fields in the kvm_run struct should I check that could cause such
> an issue?

Heh, all of them.  I'm only somewhat joking.  Root causing "invalid control field"
errors on bare metal is painfully difficult, bordering on impossible if you don't
have something to give you a hint as to what might be going wrong.

If you can, try running a nested setup, i.e. run a normal Linux guest as your L1
VM (L0 is bare metal), and then run your problematic x86 emulator VM within that
L1 guest (that's your L2).  Then, in L0 (your bare metal host), enable the
kvm_nested_vmenter_failed tracepoint.

The kvm_nested_vmenter_failed tracepoint logs all VM-Enter failures that _KVM_
detects when L1 attempts a nested VM-Enter from L1 to L2.  If you're at all lucky,
KVM in L0 (acting a the CPU from L1's perspective) will detect the invalid state
and explicitly log which consistency check failed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-26 17:17 ` Sean Christopherson
@ 2023-07-26 19:16   ` Yahya Sohail
  2023-07-26 19:51     ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yahya Sohail @ 2023-07-26 19:16 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On 7/26/23 12:17, Sean Christopherson wrote:
>> If so, what fields in the kvm_run struct should I check that could cause such
>> an issue?
> 
> Heh, all of them.  I'm only somewhat joking.  Root causing "invalid control field"
> errors on bare metal is painfully difficult, bordering on impossible if you don't
> have something to give you a hint as to what might be going wrong.

I suppose that's what I was expecting, but was hoping it could be 
narrowed down a bit. Could the values of the CPU control registers or 
other special registers set with KVM_SET_SREGS also cause this error 
(with hardware_entry_failure_reason = 7)? I'd expect this not to be 
possible because I don't think the CPU registers are part of the VMCS, 
but I'm not very familiar with VMX.

I do know that the emulator I'm copying state from likely doesn't 
consider all bits in the control fields, so it's possible that they're 
in an invalid state. When I ran the model before with the value for cr0 
copied out of the emulator I also got KVM_EXIT_FAIL_ENTRY, but with a 
different value for hardware_entry_failure_reason = 0x80000021. I fixed 
this by changing the value of cr0 to be (hopefully) valid.

> If you can, try running a nested setup, i.e. run a normal Linux guest as your L1
> VM (L0 is bare metal), and then run your problematic x86 emulator VM within that
> L1 guest (that's your L2).  Then, in L0 (your bare metal host), enable the
> kvm_nested_vmenter_failed tracepoint.
> 
> The kvm_nested_vmenter_failed tracepoint logs all VM-Enter failures that _KVM_
> detects when L1 attempts a nested VM-Enter from L1 to L2.  If you're at all lucky,
> KVM in L0 (acting a the CPU from L1's perspective) will detect the invalid state
> and explicitly log which consistency check failed.

I did this and had an interesting result. Instead of exiting with 
KVM_EXIT_FAIL_ENTRY, it exited with KVM_EXIT_UNkNOWN, and 
hardware_exit_reason = 0. I also didn't get anything logged from the 
kvm_nested_vmenter_failed trace point. When I checked the value of rip 
after KVM_RUN, it was the same as the starting value, so it probably 
failed without executing any instructions.

I then tried setting the kvm_nested_vmexit tracepoint to see if I could 
get any more information about the vmexit. When the vmexit occurred, I 
got a line in the log that looked like this:

CPU 3/KVM-9310    [013] ....  6076.453278: kvm_nested_vmexit: vcpu 3 
reason EPT_VIOLATION rip 0x103c00 info1 0x0000000000000781 info2 
0x000000008000030d intr_info 0x00000000 error_code 0x00000000

It appears this occurred due to an EPT_VIOLATION. I have some questions:
I believe an EPT_VIOLATION is caused by trying to access physical memory 
that is not mapped. Is that correct? Also, could this be the same error 
that causes the KVM_EXIT_FAIL_ENTRY when running the VM as L1, or must 
that be a separate issue?

I know that the paging code of the emulator the state is from is a 
little suspect (in fact, one of my reasons to get this VM working in KVM 
is to help debug the emulator), and it is possible that the page tables 
of the VM are not setup properly and are mapping linear addresses to 
unexpected physical addresses and causing an EPT_VIOLATION. I'll have to 
look into that further.

Thanks for the help,
Yahya Sohail

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-26 19:16   ` Yahya Sohail
@ 2023-07-26 19:51     ` Sean Christopherson
  2023-07-26 22:14       ` Yahya Sohail
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2023-07-26 19:51 UTC (permalink / raw)
  To: Yahya Sohail; +Cc: kvm

On Wed, Jul 26, 2023, Yahya Sohail wrote:
> On 7/26/23 12:17, Sean Christopherson wrote:
> > > If so, what fields in the kvm_run struct should I check that could cause such
> > > an issue?
> > 
> > Heh, all of them.  I'm only somewhat joking.  Root causing "invalid control field"
> > errors on bare metal is painfully difficult, bordering on impossible if you don't
> > have something to give you a hint as to what might be going wrong.
> 
> I suppose that's what I was expecting, but was hoping it could be narrowed
> down a bit. Could the values of the CPU control registers or other special
> registers set with KVM_SET_SREGS also cause this error (with
> hardware_entry_failure_reason = 7)? I'd expect this not to be possible
> because I don't think the CPU registers are part of the VMCS, but I'm not
> very familiar with VMX.
> 
> I do know that the emulator I'm copying state from likely doesn't consider
> all bits in the control fields, so it's possible that they're in an invalid
> state. When I ran the model before with the value for cr0 copied out of the
> emulator I also got KVM_EXIT_FAIL_ENTRY, but with a different value for
> hardware_entry_failure_reason = 0x80000021. I fixed this by changing the
> value of cr0 to be (hopefully) valid.

What were the before and after values of CR0?

> > If you can, try running a nested setup, i.e. run a normal Linux guest as your L1
> > VM (L0 is bare metal), and then run your problematic x86 emulator VM within that
> > L1 guest (that's your L2).  Then, in L0 (your bare metal host), enable the
> > kvm_nested_vmenter_failed tracepoint.
> > 
> > The kvm_nested_vmenter_failed tracepoint logs all VM-Enter failures that _KVM_
> > detects when L1 attempts a nested VM-Enter from L1 to L2.  If you're at all lucky,
> > KVM in L0 (acting a the CPU from L1's perspective) will detect the invalid state
> > and explicitly log which consistency check failed.
> 
> I did this and had an interesting result. Instead of exiting with
> KVM_EXIT_FAIL_ENTRY, it exited with KVM_EXIT_UNkNOWN, and
> hardware_exit_reason = 0.

Hrm, what kernel version are you running as L1?  KVM on x86 doesn't explicitly
return KVM_EXIT_UNKNOWN except in a few paths that I highly doubt you are hitting.

> I also didn't get anything logged from the kvm_nested_vmenter_failed trace
> point. When I checked the value of rip after KVM_RUN, it was the same as the
> starting value, so it probably failed without executing any instructions.
> 
> I then tried setting the kvm_nested_vmexit tracepoint to see if I could get
> any more information about the vmexit. When the vmexit occurred, I got a
> line in the log that looked like this:
> 
> CPU 3/KVM-9310    [013] ....  6076.453278: kvm_nested_vmexit: vcpu 3 reason
> EPT_VIOLATION rip 0x103c00 info1 0x0000000000000781 info2 0x000000008000030d
> intr_info 0x00000000 error_code 0x00000000

So getting an EPT violation VM-Exit means the VM-Entry was successful.  Are you
running different kernel versions for L0 versus L1?  If so, it's possible that
there's a bug (or bug fix) in one kernel and not the other.

> It appears this occurred due to an EPT_VIOLATION. I have some questions:
> I believe an EPT_VIOLATION is caused by trying to access physical memory
> that is not mapped. Is that correct?

Yep.  The "info1 0x0000000000000781" from above is the EXIT_QUALIFICATION field,
which for EPT violations is equivalent to a #PF error code.  0x781 means a read
access faulted and the mapping was !present, e.g. as opposed to the mapping
being !readable (EPT supports execute-only mappings).

The other interesting bit is "info 0x000000008000030d", which is the vectoring
info.  That value means that the EPT violation occurred while the CPU was trying
to deliver a #GP in the guest.  In and of itself, that isn't fatal, but it does
suggest that something might be going wrong in the emulator.

> Also, could this be the same error that causes the KVM_EXIT_FAIL_ENTRY when
> running the VM as L1, or must that be a separate issue?

Maybe?  EPT violations themselves are not errors (ignore the "violation" part,
it's not as scary as it sounds).  But if the exit to userspace is related to the
EPT violation, I would expect uuuKVM_EXIT_MMIO, not KVM_EXIT_UNKNOWN.

> I know that the paging code of the emulator the state is from is a little
> suspect (in fact, one of my reasons to get this VM working in KVM is to help
> debug the emulator), and it is possible that the page tables of the VM are
> not setup properly and are mapping linear addresses to unexpected physical
> addresses and causing an EPT_VIOLATION. I'll have to look into that further.

Turn on the kvm_page_fault tracepoint, that will give the gpa on which the fault
occurs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-26 19:51     ` Sean Christopherson
@ 2023-07-26 22:14       ` Yahya Sohail
  2023-07-27 16:52         ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yahya Sohail @ 2023-07-26 22:14 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On 7/26/23 14:51, Sean Christopherson wrote:
> On Wed, Jul 26, 2023, Yahya Sohail wrote:
>> On 7/26/23 12:17, Sean Christopherson wrote:
>> I do know that the emulator I'm copying state from likely doesn't consider
>> all bits in the control fields, so it's possible that they're in an invalid
>> state. When I ran the model before with the value for cr0 copied out of the
>> emulator I also got KVM_EXIT_FAIL_ENTRY, but with a different value for
>> hardware_entry_failure_reason = 0x80000021. I fixed this by changing the
>> value of cr0 to be (hopefully) valid.
> 
> What were the before and after values of CR0?

Before, CR0 was 0x80000000. It appears the paging bit was not set even 
after I "fixed" cr0. I have now made sure the paging bit and the fixed 
bits are properly set in CR0. CR0 is now equal to 0x8393870b and I get 
VM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 0x80000021 
whether I run it as L1 or L2.

I'm also now getting this tracepoint log on my L0 when I run the VM as L2:
CPU 12/KVM-9319    [014] .... 17072.747744: kvm_nested_vmexit: vcpu 12 
reason INVALID_STATE FAILED_VMENTRY rip 0x103c00 info1 
0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 
error_code 0x00000000

>>> If you can, try running a nested setup, i.e. run a normal Linux guest as your L1
>>> VM (L0 is bare metal), and then run your problematic x86 emulator VM within that
>>> L1 guest (that's your L2).  Then, in L0 (your bare metal host), enable the
>>> kvm_nested_vmenter_failed tracepoint.
>>>
>>> The kvm_nested_vmenter_failed tracepoint logs all VM-Enter failures that _KVM_
>>> detects when L1 attempts a nested VM-Enter from L1 to L2.  If you're at all lucky,
>>> KVM in L0 (acting a the CPU from L1's perspective) will detect the invalid state
>>> and explicitly log which consistency check failed.
>>
>> I did this and had an interesting result. Instead of exiting with
>> KVM_EXIT_FAIL_ENTRY, it exited with KVM_EXIT_UNkNOWN, and
>> hardware_exit_reason = 0.
> 
> Hrm, what kernel version are you running as L1?  KVM on x86 doesn't explicitly
> return KVM_EXIT_UNKNOWN except in a few paths that I highly doubt you are hitting.
> 
>> I also didn't get anything logged from the kvm_nested_vmenter_failed trace
>> point. When I checked the value of rip after KVM_RUN, it was the same as the
>> starting value, so it probably failed without executing any instructions.
>>
>> I then tried setting the kvm_nested_vmexit tracepoint to see if I could get
>> any more information about the vmexit. When the vmexit occurred, I got a
>> line in the log that looked like this:
>>
>> CPU 3/KVM-9310    [013] ....  6076.453278: kvm_nested_vmexit: vcpu 3 reason
>> EPT_VIOLATION rip 0x103c00 info1 0x0000000000000781 info2 0x000000008000030d
>> intr_info 0x00000000 error_code 0x00000000
> 
> So getting an EPT violation VM-Exit means the VM-Entry was successful.  Are you
> running different kernel versions for L0 versus L1?  If so, it's possible that
> there's a bug (or bug fix) in one kernel and not the other.

My L0 is on 5.10.186, and L1 is on 6.1.30.

>> It appears this occurred due to an EPT_VIOLATION. I have some questions:
>> I believe an EPT_VIOLATION is caused by trying to access physical memory
>> that is not mapped. Is that correct?
> 
> Yep.  The "info1 0x0000000000000781" from above is the EXIT_QUALIFICATION field,
> which for EPT violations is equivalent to a #PF error code.  0x781 means a read
> access faulted and the mapping was !present, e.g. as opposed to the mapping
> being !readable (EPT supports execute-only mappings).
> 
> The other interesting bit is "info 0x000000008000030d", which is the vectoring
> info.  That value means that the EPT violation occurred while the CPU was trying
> to deliver a #GP in the guest.  In and of itself, that isn't fatal, but it does
> suggest that something might be going wrong in the emulator.

The state the VM is in is supposed to be the beginning of Linux boot 
(i.e. the bootloader has just jumped into the Linux entrypoint). Thus, 
the IDT is not yet setup, so I would expect there to be some errors if 
the CPU attempted to deliver a #GP to the guest.

>> Also, could this be the same error that causes the KVM_EXIT_FAIL_ENTRY when
>> running the VM as L1, or must that be a separate issue?
> 
> Maybe?  EPT violations themselves are not errors (ignore the "violation" part,
> it's not as scary as it sounds).  But if the exit to userspace is related to the
> EPT violation, I would expect uuuKVM_EXIT_MMIO, not KVM_EXIT_UNKNOWN.
> 
>> I know that the paging code of the emulator the state is from is a little
>> suspect (in fact, one of my reasons to get this VM working in KVM is to help
>> debug the emulator), and it is possible that the page tables of the VM are
>> not setup properly and are mapping linear addresses to unexpected physical
>> addresses and causing an EPT_VIOLATION. I'll have to look into that further.

I thought the EPT violation occurred when fetching the instruction, but 
if I use KVM_TRANSLATE to translate RIP into a physical address, it 
appears to translate it to the physical address I'd expect (and one 
which should be mapped). Given that the RIP address translates correctly 
in KVM, I don't think the paging system of the emulator is to blame for 
the issue.

> Turn on the kvm_page_fault tracepoint, that will give the gpa on which the fault
> occurs.

Prior to my new fix for CR0, in the log after enabling this trace point, 
I saw there was a page fault for reading the address 0x34 because it was 
not present. I think this was because the IDT address was set to 0, and 
the interrupt gate for a #GP would therefore be at address 0x34.

It seems the reason for getting a hardware_entry_failure_reason = 7 when 
running the VM as L1 was the same as the reason for getting a #GP when 
running the VM as L2. CR0 was invalid, and that caused a 
KVM_EXIT_FAIL_ENTRY when running as L1 and a #GP when running as L2. In 
the latter case, that lead to a page fault because the IDT was not yet 
present.

That being said, I'm still not sure how to go about debugging the 
VM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 0x80000021. The 
entry in the tracepoint log (see above) of L0 (when running the VM as 
L2) does not seem to be very helpful (unlike the invalid CR0 messages I 
got before when CR0 was invalid). Is there any more information that can 
be gleaned from this log entry? Any other way to get more information as 
to what piece of state is invalid?

Thanks,
Yahya Sohail

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-26 22:14       ` Yahya Sohail
@ 2023-07-27 16:52         ` Sean Christopherson
  2023-07-28 17:45           ` Yahya Sohail
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2023-07-27 16:52 UTC (permalink / raw)
  To: Yahya Sohail; +Cc: kvm

On Wed, Jul 26, 2023, Yahya Sohail wrote:
> On 7/26/23 14:51, Sean Christopherson wrote:
> > On Wed, Jul 26, 2023, Yahya Sohail wrote:
> > > On 7/26/23 12:17, Sean Christopherson wrote:
> > > I do know that the emulator I'm copying state from likely doesn't consider
> > > all bits in the control fields, so it's possible that they're in an invalid
> > > state. When I ran the model before with the value for cr0 copied out of the
> > > emulator I also got KVM_EXIT_FAIL_ENTRY, but with a different value for
> > > hardware_entry_failure_reason = 0x80000021. I fixed this by changing the
> > > value of cr0 to be (hopefully) valid.
> > 
> > What were the before and after values of CR0?
> 
> Before, CR0 was 0x80000000.

Yeah, CR0.PG=1 with CR0.PE=0 (paging without protected mode) is invalid.  But
KVM fails to reject this combination from userspace without the series/patch I
linked earlier.  That would explain why the VM-Enter fails instead of KVM rejecting
KVM_SET_SREGS.

https://lore.kernel.org/all/20230613203037.1968489-1-seanjc@google.com

> It appears the paging bit was not set even after
> I "fixed" cr0. I have now made sure the paging bit and the fixed bits are
> properly set in CR0. CR0 is now equal to 0x8393870b

How did you get that value?   AFAICT, it's not outright illegal, but only because
the CPU ignores reserved bits in CR0[31:0], as opposed to rejecting them.

> That being said, I'm still not sure how to go about debugging the
> VM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 0x80000021. The
> entry in the tracepoint log (see above) of L0 (when running the VM as L2)
> does not seem to be very helpful (unlike the invalid CR0 messages I got
> before when CR0 was invalid). Is there any more information that can be
> gleaned from this log entry?

Probably not.

> Any other way to get more information as to what piece of state is invalid?

Enable /sys/module/kvm_intel/parameters/dump_invalid_vmcs, then KVM will print
out most (all?) VMCS fields on the failed VM-Entry.  From there you'll have to
hunt through guest state to figure out which fields, or combinations of fields,
is invalid.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-27 16:52         ` Sean Christopherson
@ 2023-07-28 17:45           ` Yahya Sohail
  2023-08-02 19:04             ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Yahya Sohail @ 2023-07-28 17:45 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm


On 7/27/23 11:52, Sean Christopherson wrote:
> On Wed, Jul 26, 2023, Yahya Sohail wrote:
>> It appears the paging bit was not set even after
>> I "fixed" cr0. I have now made sure the paging bit and the fixed bits are
>> properly set in CR0. CR0 is now equal to 0x8393870b
> 
> How did you get that value?   AFAICT, it's not outright illegal, but only because
> the CPU ignores reserved bits in CR0[31:0], as opposed to rejecting them.

Turns out that cr0 value was still erroneous. Not certain what happened, 
but I think I may have interpreted a hex number as a decimal number at 
some point while computing it. I've updated cr0 to be 0xe0000011. I got 
this value by reading the default value of the cr0 register by calling 
KVM_GET_SREGS on a newly created VM and then setting the bits I need to 
be enabled. The issue continues to persist.

> Enable /sys/module/kvm_intel/parameters/dump_invalid_vmcs, then KVM will print
> out most (all?) VMCS fields on the failed VM-Entry.  From there you'll have to
> hunt through guest state to figure out which fields, or combinations of fields,
> is invalid.

I get the following output in my log:
*** Guest State ***
CR0: actual=0x0000000080000031, shadow=0x00000000e0000011, 
gh_mask=fffffffffffffff7
CR4: actual=0x0000000000002060, shadow=0x0000000000000020, 
gh_mask=fffffffffffef871
CR3 = 0x0000000010000000
PDPTR0 = 0x0000000000000000  PDPTR1 = 0x0000000000000000
PDPTR2 = 0x0000000000000000  PDPTR3 = 0x0000000000000000
RSP = 0x0000000002000031  RIP = 0x0000000000103c00
RFLAGS=0x00000002         DR7 = 0x0000000000000400
Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
CS:   sel=0x0010, attr=0x0a09b, limit=0x000fffff, base=0x0000000000000000
DS:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
SS:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
ES:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
FS:   sel=0x0000, attr=0x10001, limit=0x00000000, base=0x0000000000000000
GS:   sel=0x0000, attr=0x10001, limit=0x00000000, base=0x0000000000000000
GDTR:                           limit=0x00001031, base=0x001f000000000200
LDTR: sel=0x0000, attr=0x10000, limit=0x000003e8, base=0x000003e8000081a4
IDTR:                           limit=0x00000000, base=0x0000000000000000
TR:   sel=0x0000, attr=0x10021, limit=0x01211091, base=0x0000000000010304
EFER =     0x0000000000000500  PAT = 0x0007040600070406
DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
BndCfgS = 0x0000000000000000
Interruptibility = 00000000  ActivityState = 00000000
*** Host State ***
RIP = 0xffffffffc09a444f  RSP = 0xffffa3aa0d2fbd50
CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
FSBase=00007feb463e9740 GSBase=ffff93bb3dd80000 TRBase=fffffe000033d000
GDTBase=fffffe000033b000 IDTBase=fffffe0000000000
CR0=0000000080050033 CR3=0000000547886004 CR4=00000000007726e0
Sysenter RSP=fffffe000033d000 CS:RIP=0010:ffffffff910016e0
EFER = 0x0000000000000d01  PAT = 0x0407050600070106
*** Control State ***
PinBased=0000007f CPUBased=b5986dfa SecondaryExec=00032ce2
EntryControls=0001d3ff ExitControls=00abefff
ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
VMEntry: intr_info=80000044 errcode=00000000 ilen=00000000
VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
         reason=80000021 qualification=0000000000000000
IDTVectoring: info=00000000 errcode=00000000
TSC Offset = 0xfffda9968024bdbc
EPT pointer = 0x0000000103f1905e
PLE Gap=00000080 Window=00001000
Virtual processor ID = 0x0021

For the control registers, the value I set is shown as shadow not 
actual. What does that mean?

Additionally, I consulted the Intel manual for the meaning of the 
0x80000021 error code, and it appears it is caused by not meeting one or 
more of the requirements for guest state set in Volume 3 Section 27.3.1 
of the Intel manual. I noticed that there are certain requirements for 
tr and ldtr registers even though they are not really used in IA-32e 
mode (see Volume 3 Section 27.3.1.2). I've tried setting them as follows 
to meet those requirements, but that didn't seem to do anything:
   state->sregs.tr.type = 0b11;
   state->sregs.tr.s = 0;
   state->sregs.tr.present = 1;
   state->sregs.tr.g = 1;
   state->sregs.tr.limit = 0xFFFFF;
   state->sregs.tr.unusable = 0;
   state->sregs.tr.selector = 0b100;
   state->sregs.ldt.unusable = 1;

Does that look correct?

I suppose my best course of action now is to go through each check in 
Volume 3 Section 27.3.1 and check each one individually.

Thanks,
Yahya Sohail

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7
  2023-07-28 17:45           ` Yahya Sohail
@ 2023-08-02 19:04             ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2023-08-02 19:04 UTC (permalink / raw)
  To: Yahya Sohail; +Cc: kvm

On Fri, Jul 28, 2023, Yahya Sohail wrote:
> 
> On 7/27/23 11:52, Sean Christopherson wrote:
> > Enable /sys/module/kvm_intel/parameters/dump_invalid_vmcs, then KVM will print
> > out most (all?) VMCS fields on the failed VM-Entry.  From there you'll have to
> > hunt through guest state to figure out which fields, or combinations of fields,
> > is invalid.
> 
> I get the following output in my log:
> *** Guest State ***
> CR0: actual=0x0000000080000031, shadow=0x00000000e0000011,
> gh_mask=fffffffffffffff7
> CR4: actual=0x0000000000002060, shadow=0x0000000000000020,
> gh_mask=fffffffffffef871
> CR3 = 0x0000000010000000
> PDPTR0 = 0x0000000000000000  PDPTR1 = 0x0000000000000000
> PDPTR2 = 0x0000000000000000  PDPTR3 = 0x0000000000000000
> RSP = 0x0000000002000031  RIP = 0x0000000000103c00
> RFLAGS=0x00000002         DR7 = 0x0000000000000400
> Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
> CS:   sel=0x0010, attr=0x0a09b, limit=0x000fffff, base=0x0000000000000000
> DS:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
> SS:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
> ES:   sel=0x0018, attr=0x08093, limit=0x000fffff, base=0x0000000000000000
> FS:   sel=0x0000, attr=0x10001, limit=0x00000000, base=0x0000000000000000
> GS:   sel=0x0000, attr=0x10001, limit=0x00000000, base=0x0000000000000000
> GDTR:                           limit=0x00001031, base=0x001f000000000200

The GDT base is non-canonical, that's likely the direct source of the consistency
check.  

> LDTR: sel=0x0000, attr=0x10000, limit=0x000003e8, base=0x000003e8000081a4
> IDTR:                           limit=0x00000000, base=0x0000000000000000
> TR:   sel=0x0000, attr=0x10021, limit=0x01211091, base=0x0000000000010304
> EFER =     0x0000000000000500  PAT = 0x0007040600070406
> DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
> BndCfgS = 0x0000000000000000
> Interruptibility = 00000000  ActivityState = 00000000
> *** Host State ***
> RIP = 0xffffffffc09a444f  RSP = 0xffffa3aa0d2fbd50
> CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
> FSBase=00007feb463e9740 GSBase=ffff93bb3dd80000 TRBase=fffffe000033d000
> GDTBase=fffffe000033b000 IDTBase=fffffe0000000000
> CR0=0000000080050033 CR3=0000000547886004 CR4=00000000007726e0
> Sysenter RSP=fffffe000033d000 CS:RIP=0010:ffffffff910016e0
> EFER = 0x0000000000000d01  PAT = 0x0407050600070106
> *** Control State ***
> PinBased=0000007f CPUBased=b5986dfa SecondaryExec=00032ce2
> EntryControls=0001d3ff ExitControls=00abefff
> ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
> VMEntry: intr_info=80000044 errcode=00000000 ilen=00000000
> VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
>         reason=80000021 qualification=0000000000000000
> IDTVectoring: info=00000000 errcode=00000000
> TSC Offset = 0xfffda9968024bdbc
> EPT pointer = 0x0000000103f1905e
> PLE Gap=00000080 Window=00001000
> Virtual processor ID = 0x0021
> 
> For the control registers, the value I set is shown as shadow not actual.
> What does that mean?

The "shadow" is what the guest sees, the "actual" value is what is loaded in
hardware.  VMX virtualization of CR0 and CR4, through a combination of VMCS fields
(actual + shadow + gh_mask in the above terminology), effectively allows intercepting
writes to individual CR0 and CR4 bits, and entirely avoids VM-Exits on read from
CR0/CR4.

Copying from the SDM:

  MOV from CR4. The behavior of MOV from CR4 is determined by the CR4 guest/host
  mask and the CR4 read shadow. For each position corresponding to a bit clear in
  the CR4 guest/host mask, the destination operand is loaded with the value of the
  corresponding bit in CR4. For each position corresponding to a bit set in the CR4
  guest/host mask, the destination operand is loaded with the value of the corresponding
  bit in the CR4 read shadow. Thus, if every bit is cleared in the CR4 guest/host
  mask, MOV from CR4 reads normally from CR4; if every bit is set in the CR4 guest/host
  mask, MOV from CR4 returns the value of the CR4 read shadow.  Depending on the
  contents of the CR4 guest/host mask and the CR4 read shadow, bits may be set in the
  destination that would never be set when reading directly from CR4.

 ...

  MOV to CR4. An execution of MOV to CR4 that does not cause a VM exit (see Section
  26.1.3) leaves unmodified any bit in CR4 corresponding to a bit set in the CR4
  guest/host mask. Such an execution causes a general-protection exception if it
  attempts to set any bit in CR4 (not corresponding to a bit set in the CR4
  guest/host mask) to a value not supported in VMX operation (see Section 24.8).

> Additionally, I consulted the Intel manual for the meaning of the 0x80000021
> error code, and it appears it is caused by not meeting one or more of the
> requirements for guest state set in Volume 3 Section 27.3.1 of the Intel
> manual.

Yep.

> I noticed that there are certain requirements for tr and ldtr
> registers even though they are not really used in IA-32e mode (see Volume 3
> Section 27.3.1.2). I've tried setting them as follows to meet those
> requirements, but that didn't seem to do anything:
>   state->sregs.tr.type = 0b11;
>   state->sregs.tr.s = 0;
>   state->sregs.tr.present = 1;
>   state->sregs.tr.g = 1;
>   state->sregs.tr.limit = 0xFFFFF;
>   state->sregs.tr.unusable = 0;
>   state->sregs.tr.selector = 0b100;
>   state->sregs.ldt.unusable = 1;
> 
> Does that look correct?

Maybe?  Sorry, I've essentially exhausted my bandwidth for helping this along
beyond quick comments.

> I suppose my best course of action now is to go through each check in Volume
> 3 Section 27.3.1 and check each one individually.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-02 19:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 16:35 KVM_EXIT_FAIL_ENTRY with hardware_entry_failure_reason = 7 Yahya Sohail
2023-07-26 17:17 ` Sean Christopherson
2023-07-26 19:16   ` Yahya Sohail
2023-07-26 19:51     ` Sean Christopherson
2023-07-26 22:14       ` Yahya Sohail
2023-07-27 16:52         ` Sean Christopherson
2023-07-28 17:45           ` Yahya Sohail
2023-08-02 19:04             ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox