From: Sean Christopherson <seanjc@google.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com,
joro@8bytes.org, wanpengli@tencent.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM: X86: Fix the comments in prepare_vmcs02_rare()
Date: Fri, 15 Jul 2022 15:56:31 +0000 [thread overview]
Message-ID: <YtGOL4jIMQwoW5vb@google.com> (raw)
In-Reply-To: <20220715114211.53175-3-yu.c.zhang@linux.intel.com>
On Fri, Jul 15, 2022, Yu Zhang wrote:
> Although EB.PF in vmcs02 is still set by simply "or"ing the EB of
> vmcs01 and vmcs12, the explanation is obsolete. "enable_ept" being
> set is not the only reason for L0 to clear its EB.PF.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 778f82015f03..634a7d218048 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2451,10 +2451,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> * is not easy (if at all possible?) to merge L0 and L1's desires, we
> * simply ask to exit on each and every L2 page fault. This is done by
> * setting MASK=MATCH=0 and (see below) EB.PF=1.
> - * Note that below we don't need special code to set EB.PF beyond the
> - * "or"ing of the EB of vmcs01 and vmcs12, because when enable_ept,
> - * vmcs01's EB.PF is 0 so the "or" will take vmcs12's value, and when
> - * !enable_ept, EB.PF is 1, so the "or" will always be 1.
> + * Note that EB.PF is set by "or"ing of the EB of vmcs01 and vmcs12,
> + * because when L0 has no desire to intercept #PF, vmcs01's EB.PF is 0
> + * so the "or" will take vmcs12's value, otherwise EB.PF is 1, so the
> + * "or" will always be 1.
Oof! I was going to respond with a variety of nits (about the existing comment),
and even suggest that we address the TODO just out of sight, but looking at all
of this made me realize there's a bug here! vmx_update_exception_bitmap() doesn't
update MASK and MATCH!
Hitting the bug is extremely unlikely, as it would require changing the guest's
MAXPHYADDR via KVM_SET_CPUID2 _after_ KVM_SET_NESTED_STATE, but before KVM_RUN
(because KVM now disallows changin CPUID after KVM_RUN).
During KVM_SET_CPUID2, KVM will invoke vmx_update_exception_bitmap() to refresh
the exception bitmap to handle the ept=1 && allow_smaller_maxphyaddr=1 scenario.
But when L2 is active, vmx_update_exception_bitmap() assumes vmcs02 already has
the correct MASK+MATCH because of the "clear both if KVM and L1 both want #PF"
behavior. But if KVM's desire to intercept #PF changes from 0=>1, then KVM will
run L2 with the MASK+MATCH from vmcs12 because vmx_need_pf_intercept() would have
returned false at the time of prepare_vmcs02_rare().
Fixing the bug is fairly straightforward, and presents a good opportunity to
clean up the code (and this comment) and address the TODO.
Unless someone objects to my suggestion for patch 01, can you send a new version
of patch 01? I'll send a separate series to fix this theoretical bug, avoid
writing MASK+MATCH when vmcs0x.EXCEPTION_BITMAP.PF+0, and to address the TODO.
E.g. I believe this is what we want to end up with:
if (vmcs12)
eb |= vmcs12->exception_bitmap;
/*
* #PF is conditionally intercepted based on the #PF error code (PFEC)
* combined with the exception bitmap. #PF is intercept if:
*
* EXCEPTION_BITMAP.PF=1 && ((PFEC & MASK) == MATCH).
*
* If any #PF is being intercepted, update MASK+MATCH, otherwise leave
* them alone they do not affect interception (EXCEPTION_BITMAP.PF=0).
*/
if (eb & (1u << PF_VECTOR)) {
/*
* If EPT is enabled, #PF is only intercepted if MAXPHYADDR is
* smaller on the guest than on the host. In that case, KVM
* only needs to intercept present, non-reserved #PF. If EPT
* is disabled, i.e. KVM is using shadow paging, KVM needs to
* intercept all #PF. Note, whether or not KVM wants to
* intercept _any_ #PF is handled below.
*/
if (enable_ept) {
pfec_mask = PFERR_PRESENT_MASK | PFERR_RSVD_MASK;
pfec_match = PFERR_PRESENT_MASK;
} else {
pfec_mask = 0;
pfec_match = 0;
}
if (!(vmcs12->exception_bitmap & (1u << PF_VECTOR))) {
/* L1 doesn't want to intercept #PF, use KVM's MASK+MATCH. */
} else if (!kvm_needs_pf_intercept) {
/* KVM doesn't want to intercept #PF, use L1's MASK+MATCH. */
pfec_mask = vmcs12->page_fault_error_code_mask;
pfec_match = vmcs12->page_fault_error_code_match;
} else if (pfec_mask != vmcs12->page_fault_error_code_mask ||
pfec_match != vmcs12->page_fault_error_code_mask) {
/*
* KVM and L1 want to intercept #PF with different MASK
* and/or MATCH. For simplicity, intercept all #PF by
* clearing MASK+MATCH. Merging KVM's and L1's desires
* is quite complex, while the odds of meaningfully
* reducing what #PFs are intercept are low.
*/
pfec_mask = 0;
pfec_match = 0;
} else {
/* KVM and L1 have identical MASK+MATCH. */
}
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, pfec_mask);
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, pfec_match);
}
next prev parent reply other threads:[~2022-07-15 15:56 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-15 11:42 [PATCH 0/2] Fix the #PF injection logic for smaller MAXPHYADDR in nested Yu Zhang
2022-07-15 11:42 ` [PATCH 1/2] KVM: X86: Initialize 'fault' in kvm_fixup_and_inject_pf_error() Yu Zhang
2022-07-15 14:47 ` Sean Christopherson
2022-07-18 6:52 ` Yu Zhang
2022-07-15 11:42 ` [PATCH 2/2] KVM: X86: Fix the comments in prepare_vmcs02_rare() Yu Zhang
2022-07-15 15:56 ` Sean Christopherson [this message]
2022-07-18 7:58 ` Yu Zhang
2022-07-19 0:09 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YtGOL4jIMQwoW5vb@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox