Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <guangrong.xiao@gmail.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: "Peter Feiner" <pfeiner@google.com>,
	"David Matlack" <dmatlack@google.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Xiao Guangrong" <xiaoguangrong@tencent.com>,
	"Wanpeng Li" <wanpeng.li@hotmail.com>,
	kevin.tian@intel.com
Subject: Re: [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses
Date: Fri, 12 May 2017 15:38:03 +0800	[thread overview]
Message-ID: <4a02522c-50a1-0aa2-879c-98ba7631ffbe@gmail.com> (raw)
In-Reply-To: <1494501810-11822-3-git-send-email-pbonzini@redhat.com>


CC Kevin as i am not sure if Intel is aware of this issue, it
breaks other hypervisors, e.g, Xen, as swell.

On 05/11/2017 07:23 PM, Paolo Bonzini wrote:
> The new ept_access_test_paddr_read_only_ad_disabled testcase
> caused an infinite stream of EPT violations because KVM did not
> find anything bad in the page tables and kept re-executing the
> faulting instruction.
> 
> This is because the exit qualification said we were reading from
> the page tables, but actually writing the cause of the EPT violation
> was writing the A/D bits.  This happened even with eptad=0, quite
> surprisingly.
> 
> Thus, always treat guest page table accesses as read+write operations,
> even if the exit qualification says otherwise.  This fixes the
> testcase.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++-------------
>   1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c6f4ad44aa95..c868cbdad29a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>   	u32 error_code;
>   
>   	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +	trace_kvm_page_fault(gpa, exit_qualification);
>   
> -	if (is_guest_mode(vcpu)
> -	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
> -		/*
> -		 * Fix up exit_qualification according to whether guest
> -		 * page table accesses are reads or writes.
> -		 */
> -		u64 eptp = nested_ept_get_cr3(vcpu);
> -		if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
> -			exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
> -	}
> +	/*
> +	 * All guest page table accesses are potential writes to A/D bits.
> +	 * but EPT microcode only reports them as such when EPT A/D is
> +	 * enabled.  Tracing ept_access_test_paddr_read_only_ad_disabled (from
> +	 * kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
> +	 * does not change its behavior when EPTP enables A/D bits; the only
> +	 * difference is in the exit qualification.  So fix this up here.
> +	 */
> +	if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
> +		exit_qualification |= EPT_VIOLATION_ACC_WRITE;
>   
>   	/*
>   	 * EPT violation happened while executing iret from NMI,
> @@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>   			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
>   		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
>   
> -	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -	trace_kvm_page_fault(gpa, exit_qualification);
> -
>   	/* Is it a read fault? */
>   	error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
>   		     ? PFERR_USER_MASK : 0;
> @@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>   		      ? PFERR_PRESENT_MASK : 0;
>   
>   	vcpu->arch.gpa_available = true;
> +
> +	if (is_guest_mode(vcpu)
> +	    && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
> +		/*
> +		 * Now fix up exit_qualification according to what the
> +		 * L1 hypervisor expects to see.
> +		 */
> +		u64 eptp = nested_ept_get_cr3(vcpu);
> +		if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
> +			exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
> +	}

I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on
page structures) is not we expect.

Maybe always report the right behavior is better? Especially，Intel may fix its
microcode as it hurts the newest CPUs as well.

Thanks!

next prev parent reply	other threads:[~2017-05-12  7:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 11:23 [PATCH 0/2] KVM: nVMX: nested EPT fixes Paolo Bonzini
2017-05-11 11:23 ` [PATCH 1/2] KVM: nVMX: fix EPT permissions as reported in exit qualification Paolo Bonzini
2017-05-12  3:59   ` Xiao Guangrong
2017-05-12  5:13     ` Xiao Guangrong
2017-05-11 11:23 ` [PATCH 2/2] KVM: nVMX: fix nEPT handling of guest page table accesses Paolo Bonzini
2017-05-12  7:38   ` Xiao Guangrong [this message]
2017-05-12  8:39     ` Paolo Bonzini
2017-05-16 14:03 ` [PATCH 0/2] KVM: nVMX: nested EPT fixes Radim Krčmář

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a02522c-50a1-0aa2-879c-98ba7631ffbe@gmail.com \
    --to=guangrong.xiao@gmail.com \
    --cc=dmatlack@google.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pfeiner@google.com \
    --cc=rkrcmar@redhat.com \
    --cc=wanpeng.li@hotmail.com \
    --cc=xiaoguangrong@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox