From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Jim Mattson <jmattson@google.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 10/13] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN
Date: Tue, 9 Dec 2025 08:03:15 -0800 [thread overview]
Message-ID: <aThIQzni6fC1qdgj@google.com> (raw)
In-Reply-To: <20251110222922.613224-11-yosry.ahmed@linux.dev>
On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> All accesses to the VMCB12 in the guest memory are limited to
> nested_svm_vmrun(). However, the VMCB12 remains mapped until the end of
> the function execution. Unmapping right after the consistency checks is
> possible, but it becomes easy-ish to introduce bugs where 'vmcb12' is
> used after being unmapped.
>
> Move all accesses to the VMCB12 into a new helper,
> nested_svm_vmrun_read_vmcb12(), that maps the VMCB12,
> caches the needed fields, performs consistency checks, and unmaps it.
> This limits the scope of the VMCB12 mapping appropriately. It also
> slightly simplifies the cleanup path of nested_svm_vmrun().
>
> nested_svm_vmrun_read_vmcb12() returns -1 if the consistency checks
> fail, maintaining the current behavior of skipping the instructions and
> unmapping the VMCB12 (although in the opposite order).
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> arch/x86/kvm/svm/nested.c | 59 ++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index ddcd545ec1c3c..a48668c36a191 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1023,12 +1023,39 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
> return 0;
> }
>
> +static int nested_svm_vmrun_read_vmcb12(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
"read_vmcb12"() sounds like a generic helper to read a specific field. And if
the name is more specific, then I think we can drop the "vmrun" scoping. To
aligh with similar functions in VMX and __nested_copy_vmcb_save_to_cache(), how
about nested_svm_copy_vmcb12_to_cache()?
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + struct kvm_host_map map;
> + struct vmcb *vmcb12;
> + int ret;
> +
> + ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
> + if (ret)
> + return ret;
> +
> + vmcb12 = map.hva;
> +
> + nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
> + nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
> +
> + if (!nested_vmcb_check_save(vcpu) ||
> + !nested_vmcb_check_controls(vcpu)) {
> + vmcb12->control.exit_code = SVM_EXIT_ERR;
> + vmcb12->control.exit_code_hi = 0;
> + vmcb12->control.exit_info_1 = 0;
> + vmcb12->control.exit_info_2 = 0;
> + ret = -1;
I don't love shoving the consistency checks in here. I get why you did it, but
it's very surprising to see (and/or easy to miss) these consistency checks. The
caller also ends up quite wonky:
if (ret == -EINVAL) {
kvm_inject_gp(vcpu, 0);
return 1;
} else if (ret) {
return kvm_skip_emulated_instruction(vcpu);
}
ret = kvm_skip_emulated_instruction(vcpu);
Ha! And it's buggy. __kvm_vcpu_map() can return -EFAULT if creating a host
mapping fails. Eww, and blindly using '-1' as the "failed a consistency check"
is equally cross, as it relies on kvm_vcpu_map() not returning -EPERM in a very
weird way.
Ugh, and there's also this nastiness in nested_vmcb_check_controls():
* Make sure we did not enter guest mode yet, in which case
* kvm_read_cr0() could return L2's CR0.
*/
WARN_ON_ONCE(is_guest_mode(vcpu));
return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
nested_vmcb_check_save() and nested_vmcb_check_controls() really shouldn't exist.
They just make it harder to see what KVM is checking in the "normal" flow.
Aha! And I'm fairly certain there are at least two pre-existing bugs due to KVM
doing "early" consistency checks in nested_svm_vmrun().
1. KVM doesn't clear GIF on the early #VMEXIT. In classic APM fashion, nothing
_requires_ GIF=0 before VMRUN:
It is assumed that VMM software cleared GIF some time before executing
the VMRUN instruction, to ensure an atomic state switch.
And so VMRUN with GIF=1 that hits an "early" consistency check #VMEXIT would
incorrectly leave GIF=1.
2. svm_leave_smm() is missing consistency checks on the newly loaded guest state,
because the checks aren't performed by enter_svm_guest_mode(). I don't see
anything that would prevent vmcb12 from being modified by the guest bewteen
SMI and RSM.
Moving the consistency checks into enter_svm_guest_mode() would solve all three
(four?) problems. And as a bonus, nested_svm_copy_vmcb12_to_cache() can use
kvm_vcpu_map_readonly().
Compile tested only, but I think we can end up with delta like so:
---
arch/x86/kvm/svm/nested.c | 67 ++++++++++++---------------------------
1 file changed, 20 insertions(+), 47 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7c86987fdaca..8a0df6c535b5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -372,9 +372,9 @@ static bool nested_svm_check_event_inj(struct kvm_vcpu *vcpu, u32 event_inj)
return true;
}
-static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control,
- unsigned long l1_cr0)
+static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -408,8 +408,8 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
}
/* Common checks that apply to both L1 and L2 state. */
-static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
- struct vmcb_save_area_cached *save)
+static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
+ struct vmcb_save_area_cached *save)
{
if (CC(!(save->efer & EFER_SVME)))
return false;
@@ -448,27 +448,6 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
return true;
}
-static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_save_area_cached *save = &svm->nested.save;
-
- return __nested_vmcb_check_save(vcpu, save);
-}
-
-static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
-
- /*
- * Make sure we did not enter guest mode yet, in which case
- * kvm_read_cr0() could return L2's CR0.
- */
- WARN_ON_ONCE(is_guest_mode(vcpu));
- return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
-}
-
static
void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *to,
@@ -1004,6 +983,12 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
svm_switch_vmcb(svm, &svm->nested.vmcb02);
+
+ if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
+ !nested_vmcb_check_controls(vcpu, &svm->nested.ctl,
+ svm->vmcb01.ptr->save.cr0))
+ return -EINVAL;
+
nested_vmcb02_prepare_control(svm, save->rip, save->cs.base);
nested_vmcb02_prepare_save(svm);
@@ -1025,33 +1010,24 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_vmrun)
return 0;
}
-static int nested_svm_vmrun_read_vmcb12(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
+static int nested_svm_copy_vmcb12_to_cache(struct kvm_vcpu *vcpu, u64 vmcb12_gpa)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_host_map map;
struct vmcb *vmcb12;
- int ret;
+ int r;
- ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
- if (ret)
- return ret;
+ r = kvm_vcpu_map_readonly(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
+ if (r)
+ return r;
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- if (!nested_vmcb_check_save(vcpu) ||
- !nested_vmcb_check_controls(vcpu)) {
- vmcb12->control.exit_code = SVM_EXIT_ERR;
- vmcb12->control.exit_code_hi = -1u;
- vmcb12->control.exit_info_1 = 0;
- vmcb12->control.exit_info_2 = 0;
- ret = -1;
- }
-
kvm_vcpu_unmap(vcpu, &map);
- return ret;
+ return 0;
}
int nested_svm_vmrun(struct kvm_vcpu *vcpu)
@@ -1082,12 +1058,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return -EINVAL;
vmcb12_gpa = svm->vmcb->save.rax;
- ret = nested_svm_vmrun_read_vmcb12(vcpu, vmcb12_gpa);
- if (ret == -EINVAL) {
+ if (nested_svm_copy_vmcb12_to_cache(vcpu, vmcb12_gpa)) {
kvm_inject_gp(vcpu, 0);
return 1;
- } else if (ret) {
- return kvm_skip_emulated_instruction(vcpu);
}
ret = kvm_skip_emulated_instruction(vcpu);
@@ -1919,7 +1892,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
/* 'save' contains L1 state saved from before VMRUN */
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
+ if (!nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
@@ -1938,7 +1911,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (!(save->cr0 & X86_CR0_PG) ||
!(save->cr0 & X86_CR0_PE) ||
(save->rflags & X86_EFLAGS_VM) ||
- !__nested_vmcb_check_save(vcpu, &save_cached))
+ !nested_vmcb_check_save(vcpu, &save_cached))
goto out_free;
base-commit: 01597a665f5dcf8d7cfbedf36f4e6d46d045eb4f
--
next prev parent reply other threads:[~2025-12-09 16:03 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 22:29 [PATCH v2 00/13] Nested SVM fixes, cleanups, and hardening Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 01/13] KVM: SVM: Switch svm_copy_lbrs() to a macro Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 02/13] KVM: SVM: Add missing save/restore handling of LBR MSRs Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 03/13] KVM: selftests: Add a test for LBR save/restore (ft. nested) Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE Yosry Ahmed
2025-12-09 16:27 ` Sean Christopherson
2025-12-09 18:07 ` Yosry Ahmed
2025-12-09 18:26 ` Sean Christopherson
2025-12-09 18:35 ` Yosry Ahmed
2025-12-09 18:42 ` Sean Christopherson
2025-12-09 20:02 ` Yosry Ahmed
2025-12-12 18:32 ` Sean Christopherson
2025-12-12 18:38 ` Yosry Ahmed
2025-12-13 1:07 ` Sean Christopherson
2025-11-10 22:29 ` [PATCH v2 05/13] KVM: nSVM: Add missing consistency check for EFER, CR0, CR4, and CS Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 06/13] KVM: nSVM: Add missing consistency check for event_inj Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 07/13] KVM: SVM: Rename vmcb->nested_ctl to vmcb->misc_ctl Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 08/13] KVM: SVM: Rename vmcb->virt_ext to vmcb->misc_ctl2 Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 09/13] KVM: nSVM: Cache all used fields from VMCB12 Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 10/13] KVM: nSVM: Restrict mapping VMCB12 on nested VMRUN Yosry Ahmed
2025-12-09 16:03 ` Sean Christopherson [this message]
2025-12-09 18:24 ` Yosry Ahmed
2025-12-09 18:49 ` Sean Christopherson
2025-12-10 23:05 ` Yosry Ahmed
2025-12-11 0:55 ` Yosry Ahmed
2025-12-12 23:30 ` Sean Christopherson
2025-11-10 22:29 ` [PATCH v2 11/13] KVM: nSVM: Simplify nested_svm_vmrun() Yosry Ahmed
2025-12-09 16:11 ` Sean Christopherson
2025-12-09 18:30 ` Yosry Ahmed
2025-12-09 19:09 ` Sean Christopherson
2025-12-10 16:16 ` Yosry Ahmed
2025-12-12 23:23 ` Sean Christopherson
2025-12-11 19:25 ` Yosry Ahmed
2025-12-11 20:13 ` Yosry Ahmed
2025-12-13 0:01 ` Sean Christopherson
2025-12-15 18:34 ` Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 12/13] KVM: nSVM: Sanitize control fields copied from VMCB12 Yosry Ahmed
2025-12-09 16:19 ` Sean Christopherson
2025-12-09 18:37 ` Yosry Ahmed
2025-11-10 22:29 ` [PATCH v2 13/13] KVM: nSVM: Only copy NP_ENABLE from VMCB01's misc_ctl Yosry Ahmed
2025-12-09 16:23 ` Sean Christopherson
2025-12-09 18:38 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aThIQzni6fC1qdgj@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.