[BUG]: Crashing Xen when nestedhvm is enabled

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [BUG]: Crashing Xen when nestedhvm is enabled
@ 2025-10-25 13:44 Patrick
  2025-11-11 11:22 ` Andrew Cooper
  0 siblings, 1 reply; 2+ messages in thread
From: Patrick @ 2025-10-25 13:44 UTC (permalink / raw)
  To: xen-devel

Dear all,

I think I have found a way for a guest to crash the hypervisor,
when hardware nesting is enabled and we are running on Intel using VMX.

This is done by executing the following steps in the non nested guest:

- Enable VMX and set a vmcs active
- Override the revision id of the active vmcs using memory access to any invalid id
- call vmwrite to write the `MSR_BITMAP`

Basically this:
```C
vmxon();
vmcs = alloc();
*(uint32_t*)vmcs = correct_vmcs_revision_id;
vmptrld(vmcs);
*vmcs = invalide_vmcs_revision_id;
vmwrite(MSR_BITMAP, NULL);
```

The `vmptrld` will set the provided vmcs as the link pointer as seen in
`xen/arch/x86/hvm/vmx/vvmx.c:1834`
```
if ( cpu_has_vmx_vmcs_shadowing )
    nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
```

If the guest now calls `VMWRITE` it will access that vmcs directly,
execpt if writing/reading the `IO_BITMAP` or the `MSR_BITMAP`

`xen/arch/x86/hvm/vmx/vvmx.c:107`
```
/*
* For the following 6 encodings, we need to handle them in VMM.
* Let them vmexit as usual.
*/
set_bit(IO_BITMAP_A, vw);
set_bit(VMCS_HIGH(IO_BITMAP_A), vw);
set_bit(IO_BITMAP_B, vw);
set_bit(VMCS_HIGH(IO_BITMAP_B), vw);
set_bit(MSR_BITMAP, vw);
set_bit(VMCS_HIGH(MSR_BITMAP), vw);
```

If we now execute `vmwrite(MSR_BITMAP, 0)` in the guest it will execute this stack:
```
nvmx_handle_vmwrite
set_vvmcs_safe
set_vvmcs_real_safe
virtual_vmcs_vmwrite_safe
virtual_vmcs_enter
__vmptrld
```
The vmcs pointer being loaded in the last step being the one supplied by the guest
that has been overwritten.
Since we have overwritten the `vmcs_revision_id` the hardware will reject the vmptrld, which
will call `BUG()`.

---
A secondary similar bug happens when calling `VMXOFF` while a shadow vmcs is active.
This will not clear the shadow vmcs, and crash the guest if it ever writes to the vmcs again.
Effectively locking the page of being modified until a new vmcs is set active.
This should be fixed by applying this patch:

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2432af58e0..3895dd158a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1589,6 +1589,8 @@ static int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
     struct vcpu *v=current;
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);

+	if ( cpu_has_vmx_vmcs_shadowing )
+		nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
     nvmx_purge_vvmcs(v);
     nvmx->vmxon_region_pa = INVALID_PADDR;

As far as I have read it is not specifically stated in the Intel SDM that a VMXOFF clears the active vmcs, however it
does also not state anything otherwise and I thinks it's saner to clear it than to crash the guest because of an
vmcs error, when it has vmx disabled.

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [BUG]: Crashing Xen when nestedhvm is enabled
  2025-10-25 13:44 [BUG]: Crashing Xen when nestedhvm is enabled Patrick
@ 2025-11-11 11:22 ` Andrew Cooper
  0 siblings, 0 replies; 2+ messages in thread
From: Andrew Cooper @ 2025-11-11 11:22 UTC (permalink / raw)
  To: Patrick, xen-devel; +Cc: Jan Beulich, Roger Pau Monné

On 25/10/2025 2:44 pm, Patrick wrote:
> Dear all,
>
> I think I have found a way for a guest to crash the hypervisor,
> when hardware nesting is enabled and we are running on Intel using VMX.
>
> This is done by executing the following steps in the non nested guest:
>
> - Enable VMX and set a vmcs active
> - Override the revision id of the active vmcs using memory access to any invalid id
> - call vmwrite to write the `MSR_BITMAP`
>
> Basically this:
> ```C
> vmxon();
> vmcs = alloc();
> *(uint32_t*)vmcs = correct_vmcs_revision_id;
> vmptrld(vmcs);
> *vmcs = invalide_vmcs_revision_id;
> vmwrite(MSR_BITMAP, NULL);
> ```
>
> The `vmptrld` will set the provided vmcs as the link pointer as seen in
> `xen/arch/x86/hvm/vmx/vvmx.c:1834`
> ```
> if ( cpu_has_vmx_vmcs_shadowing )
>     nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
> ```
>
> If the guest now calls `VMWRITE` it will access that vmcs directly,
> execpt if writing/reading the `IO_BITMAP` or the `MSR_BITMAP`
>
> `xen/arch/x86/hvm/vmx/vvmx.c:107`
> ```
> /*
> * For the following 6 encodings, we need to handle them in VMM.
> * Let them vmexit as usual.
> */
> set_bit(IO_BITMAP_A, vw);
> set_bit(VMCS_HIGH(IO_BITMAP_A), vw);
> set_bit(IO_BITMAP_B, vw);
> set_bit(VMCS_HIGH(IO_BITMAP_B), vw);
> set_bit(MSR_BITMAP, vw);
> set_bit(VMCS_HIGH(MSR_BITMAP), vw);
> ```
>
> If we now execute `vmwrite(MSR_BITMAP, 0)` in the guest it will execute this stack:
> ```
> nvmx_handle_vmwrite
> set_vvmcs_safe
> set_vvmcs_real_safe
> virtual_vmcs_vmwrite_safe
> virtual_vmcs_enter
> __vmptrld
> ```
> The vmcs pointer being loaded in the last step being the one supplied by the guest
> that has been overwritten.
> Since we have overwritten the `vmcs_revision_id` the hardware will reject the vmptrld, which
> will call `BUG()`.

Hello, thanks for reporting this.

The manual very clearly says "Don't Do This", but Xen should not crash
as a result.

I think the bug is letting the shadow VMCS remain in guest memory.  It's
also ridiculous that we intercept writes into control state then emulate
what hardware would have done anyway.

One interesting thing about VMCSes is that VMXON/VMPTRLD may make them
non-coherent with the rest of memory.  This is implementation dependent,
but works in our favour.

Architecturally, the only time revision_id is sampled is during
VMPTRLD.  There are no equivalents to VMX Error 11 for other
instructions, and no mechanism I can see for reporting this specific
failure during VMEntry or non-root operations.

But, with nested virt in the mix, while a vCPU is is in (v)non-root
mode, we can de-schedule entirely, run another VM, and come back to this
vCPU.  We need to be able to guarantee that such errors can't occur, or
to be able to forward the errors properly into the guest.  There seems
to be no option for the latter.

>
> ---
> A secondary similar bug happens when calling `VMXOFF` while a shadow vmcs is active.
> This will not clear the shadow vmcs, and crash the guest if it ever writes to the vmcs again.
> Effectively locking the page of being modified until a new vmcs is set active.
> This should be fixed by applying this patch:
>
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 2432af58e0..3895dd158a 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1589,6 +1589,8 @@ static int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
>      struct vcpu *v=current;
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>  
> +	if ( cpu_has_vmx_vmcs_shadowing )
> +		nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
>      nvmx_purge_vvmcs(v);
>      nvmx->vmxon_region_pa = INVALID_PADDR;
>
> As far as I have read it is not specifically stated in the Intel SDM that a VMXOFF clears the active vmcs, however it
> does also not state anything otherwise and I thinks it's saner to clear it than to crash the guest because of an
> vmcs error, when it has vmx disabled.

This is different.  The manual very clearly says the VMCSes may become
corrupt if they're not VMCLEAR'd before VMXOFF.

In fact, there's a long standing bug/misfeature in Intel CPUs that upon
VMXOFF, non-coherent VMCSes remain non-coherent until the next VMXON, at
which point new VMPTRLD's can cause older VMCSes to be cleared (the CPU
can only hold a certain number of VMCSes in internal registers) and
written back to main memory.

In the case of e.g. kexec, the new kernel has no ability to figure out
that this is going on.

Xen will need to clear all guest VMCSes for safety, but if we were
feeling helpful, we probably ought to poison such VMCSes with 0xc2 or so.

I've opened https://gitlab.com/xen-project/xen/-/issues/218 but it might
be a little while until we get to this.

~Andrew

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-11-11 11:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-25 13:44 [BUG]: Crashing Xen when nestedhvm is enabled Patrick
2025-11-11 11:22 ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).