All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: stable@vger.kernel.org,
	Xiao Guangrong <guangrong.xiao@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Date: Thu, 10 Mar 2016 16:27:28 +0800	[thread overview]
Message-ID: <56E12FF0.90202@linux.intel.com> (raw)
In-Reply-To: <1457437467-65707-2-git-send-email-pbonzini@redhat.com>



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
> Yes, all of these are needed. :) This is admittedly a bit odd, but
> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
> and of course ept=0.
>
> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
> to fault and a supervisor write to succeed (which is correct because
> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> This enables user reads; it also disables supervisor writes, the next
> of which will then flip the bits again.
>
> When SMEP is in effect, however, U=0 will enable kernel execution of
> this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
> with U=0.  If the guest has not enabled NX, the result is a continuous
> stream of page faults due to the NX bit being reserved.
>
> The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
> switch.

Good catch!

So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
NX is inherited from kernel (kernel always sets NX if CPU supports it),
right?

>
> There is another bug in the reserved bit check, which I've split to a
> separate patch for easier application to stable kernels.
>

> Cc: stable@vger.kernel.org
> Cc: Xiao Guangrong <guangrong.xiao@redhat.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   Documentation/virtual/kvm/mmu.txt |  3 ++-
>   arch/x86/kvm/vmx.c                | 25 +++++++++++++++----------
>   2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index daf9c0f742d2..c81731096a43 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -358,7 +358,8 @@ In the first case there are two additional complications:
>   - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
>     the kernel may now execute it.  We handle this by also setting spte.nx.
>     If we get a user fetch or read fault, we'll change spte.u=1 and
> -  spte.nx=gpte.nx back.
> +  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
> +  shadow paging is in use.
>   - if CR4.SMAP is disabled: since the page has been changed to a kernel
>     page, it can not be reused when CR4.SMAP is enabled. We set
>     CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6e51493ff4f9..91830809d837 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>   	guest_efer = vmx->vcpu.arch.efer;
>
>   	/*
> -	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
> -	 * outside long mode
> +	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
>   	 */
> -	ignore_bits = EFER_NX | EFER_SCE;
> +	ignore_bits = EFER_SCE;
>   #ifdef CONFIG_X86_64
>   	ignore_bits |= EFER_LMA | EFER_LME;
>   	/* SCE is meaningful only in long mode on Intel */
>   	if (guest_efer & EFER_LMA)
>   		ignore_bits &= ~(u64)EFER_SCE;
>   #endif
> -	guest_efer &= ~ignore_bits;
> -	guest_efer |= host_efer & ignore_bits;
> -	vmx->guest_msrs[efer_offset].data = guest_efer;
> -	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
> +	/* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */

> +	if (!enable_ept) {
> +		guest_efer |= EFER_NX;
> +		ignore_bits |= EFER_NX;

Update ignore_bits is not necessary i think.

> +	}
>
>   	clear_atomic_switch_msr(vmx, MSR_EFER);
>
> @@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>   	 */
>   	if (cpu_has_load_ia32_efer ||
>   	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
> -		guest_efer = vmx->vcpu.arch.efer;
>   		if (!(guest_efer & EFER_LMA))
>   			guest_efer &= ~EFER_LME;
>   		if (guest_efer != host_efer)
>   			add_atomic_switch_msr(vmx, MSR_EFER,
>   					      guest_efer, host_efer);

So, why not set EFER_NX (if !ept) just in this branch to make the fix more simpler?

  reply	other threads:[~2016-03-10  8:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 11:44 [PATCH 0/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-08 11:44 ` [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo Paolo Bonzini
2016-03-10  8:27   ` Xiao Guangrong [this message]
2016-03-10 10:01     ` Paolo Bonzini
2016-03-10 10:09     ` Paolo Bonzini
2016-03-10 12:14       ` Xiao Guangrong
2016-03-10 12:26         ` Paolo Bonzini
2016-03-10  8:46   ` Xiao Guangrong
2016-03-10 10:03     ` Paolo Bonzini
2016-03-08 11:44 ` [PATCH 2/2] KVM: MMU: fix reserved bit check for pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Paolo Bonzini
2016-03-10  8:36   ` Xiao Guangrong
2016-03-10 10:02     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E12FF0.90202@linux.intel.com \
    --to=guangrong.xiao@linux.intel.com \
    --cc=guangrong.xiao@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.