xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Patrick <patrick@failmail.dev>, xen-devel@lists.xen.org
Cc: "Jan Beulich" <jbeulich@suse.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [BUG]: Crashing Xen when nestedhvm is enabled
Date: Tue, 11 Nov 2025 11:22:22 +0000	[thread overview]
Message-ID: <7957dea8-dbc0-4c0e-8364-0b1ff2e75f25@citrix.com> (raw)
In-Reply-To: <ab3a978a-2088-4a39-a4c1-822ae5050fe6@failmail.dev>

On 25/10/2025 2:44 pm, Patrick wrote:
> Dear all,
>
> I think I have found a way for a guest to crash the hypervisor,
> when hardware nesting is enabled and we are running on Intel using VMX.
>
> This is done by executing the following steps in the non nested guest:
>
> - Enable VMX and set a vmcs active
> - Override the revision id of the active vmcs using memory access to any invalid id
> - call vmwrite to write the `MSR_BITMAP`
>
> Basically this:
> ```C
> vmxon();
> vmcs = alloc();
> *(uint32_t*)vmcs = correct_vmcs_revision_id;
> vmptrld(vmcs);
> *vmcs = invalide_vmcs_revision_id;
> vmwrite(MSR_BITMAP, NULL);
> ```
>
> The `vmptrld` will set the provided vmcs as the link pointer as seen in
> `xen/arch/x86/hvm/vmx/vvmx.c:1834`
> ```
> if ( cpu_has_vmx_vmcs_shadowing )
>     nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
> ```
>
> If the guest now calls `VMWRITE` it will access that vmcs directly,
> execpt if writing/reading the `IO_BITMAP` or the `MSR_BITMAP`
>
> `xen/arch/x86/hvm/vmx/vvmx.c:107`
> ```
> /*
> * For the following 6 encodings, we need to handle them in VMM.
> * Let them vmexit as usual.
> */
> set_bit(IO_BITMAP_A, vw);
> set_bit(VMCS_HIGH(IO_BITMAP_A), vw);
> set_bit(IO_BITMAP_B, vw);
> set_bit(VMCS_HIGH(IO_BITMAP_B), vw);
> set_bit(MSR_BITMAP, vw);
> set_bit(VMCS_HIGH(MSR_BITMAP), vw);
> ```
>
> If we now execute `vmwrite(MSR_BITMAP, 0)` in the guest it will execute this stack:
> ```
> nvmx_handle_vmwrite
> set_vvmcs_safe
> set_vvmcs_real_safe
> virtual_vmcs_vmwrite_safe
> virtual_vmcs_enter
> __vmptrld
> ```
> The vmcs pointer being loaded in the last step being the one supplied by the guest
> that has been overwritten.
> Since we have overwritten the `vmcs_revision_id` the hardware will reject the vmptrld, which
> will call `BUG()`.

Hello, thanks for reporting this.

The manual very clearly says "Don't Do This", but Xen should not crash
as a result.

I think the bug is letting the shadow VMCS remain in guest memory.  It's
also ridiculous that we intercept writes into control state then emulate
what hardware would have done anyway.

One interesting thing about VMCSes is that VMXON/VMPTRLD may make them
non-coherent with the rest of memory.  This is implementation dependent,
but works in our favour.

Architecturally, the only time revision_id is sampled is during
VMPTRLD.  There are no equivalents to VMX Error 11 for other
instructions, and no mechanism I can see for reporting this specific
failure during VMEntry or non-root operations.

But, with nested virt in the mix, while a vCPU is is in (v)non-root
mode, we can de-schedule entirely, run another VM, and come back to this
vCPU.  We need to be able to guarantee that such errors can't occur, or
to be able to forward the errors properly into the guest.  There seems
to be no option for the latter.


>
> ---
> A secondary similar bug happens when calling `VMXOFF` while a shadow vmcs is active.
> This will not clear the shadow vmcs, and crash the guest if it ever writes to the vmcs again.
> Effectively locking the page of being modified until a new vmcs is set active.
> This should be fixed by applying this patch:
>
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 2432af58e0..3895dd158a 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1589,6 +1589,8 @@ static int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
>      struct vcpu *v=current;
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>  
> +	if ( cpu_has_vmx_vmcs_shadowing )
> +		nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
>      nvmx_purge_vvmcs(v);
>      nvmx->vmxon_region_pa = INVALID_PADDR;
>
> As far as I have read it is not specifically stated in the Intel SDM that a VMXOFF clears the active vmcs, however it
> does also not state anything otherwise and I thinks it's saner to clear it than to crash the guest because of an
> vmcs error, when it has vmx disabled.

This is different.  The manual very clearly says the VMCSes may become
corrupt if they're not VMCLEAR'd before VMXOFF.

In fact, there's a long standing bug/misfeature in Intel CPUs that upon
VMXOFF, non-coherent VMCSes remain non-coherent until the next VMXON, at
which point new VMPTRLD's can cause older VMCSes to be cleared (the CPU
can only hold a certain number of VMCSes in internal registers) and
written back to main memory.

In the case of e.g. kexec, the new kernel has no ability to figure out
that this is going on.

Xen will need to clear all guest VMCSes for safety, but if we were
feeling helpful, we probably ought to poison such VMCSes with 0xc2 or so.

I've opened https://gitlab.com/xen-project/xen/-/issues/218 but it might
be a little while until we get to this.

~Andrew


      reply	other threads:[~2025-11-11 11:23 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-25 13:44 [BUG]: Crashing Xen when nestedhvm is enabled Patrick
2025-11-11 11:22 ` Andrew Cooper [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7957dea8-dbc0-4c0e-8364-0b1ff2e75f25@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=patrick@failmail.dev \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).