public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH 4.4 10/50] KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Date: Mon, 14 Mar 2016 10:50:28 -0700	[thread overview]
Message-ID: <20160314175014.873643926@linuxfoundation.org> (raw)
In-Reply-To: <20160314175013.403628835@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 844a5fe219cf472060315971e15cbf97674a3324 upstream.

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution.  This will still cause a user write to fault, while
supervisor writes will succeed.  User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/virtual/kvm/mmu.txt |    3 ++-
 arch/x86/kvm/vmx.c                |   36 +++++++++++++++++++++++-------------
 2 files changed, 25 insertions(+), 14 deletions(-)

--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two addition
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1792,26 +1792,31 @@ static void reload_tss(void)
 
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
-	u64 guest_efer;
-	u64 ignore_bits;
+	u64 guest_efer = vmx->vcpu.arch.efer;
+	u64 ignore_bits = 0;
 
-	guest_efer = vmx->vcpu.arch.efer;
+	if (!enable_ept) {
+		/*
+		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
+		 * host CPUID is more efficient than testing guest CPUID
+		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SMEP))
+			guest_efer |= EFER_NX;
+		else if (!(guest_efer & EFER_NX))
+			ignore_bits |= EFER_NX;
+	}
 
 	/*
-	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
-	 * outside long mode
+	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
 	 */
-	ignore_bits = EFER_NX | EFER_SCE;
+	ignore_bits |= EFER_SCE;
 #ifdef CONFIG_X86_64
 	ignore_bits |= EFER_LMA | EFER_LME;
 	/* SCE is meaningful only in long mode on Intel */
 	if (guest_efer & EFER_LMA)
 		ignore_bits &= ~(u64)EFER_SCE;
 #endif
-	guest_efer &= ~ignore_bits;
-	guest_efer |= host_efer & ignore_bits;
-	vmx->guest_msrs[efer_offset].data = guest_efer;
-	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
 
 	clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1822,16 +1827,21 @@ static bool update_transition_efer(struc
 	 */
 	if (cpu_has_load_ia32_efer ||
 	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-		guest_efer = vmx->vcpu.arch.efer;
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;
 		if (guest_efer != host_efer)
 			add_atomic_switch_msr(vmx, MSR_EFER,
 					      guest_efer, host_efer);
 		return false;
-	}
+	} else {
+		guest_efer &= ~ignore_bits;
+		guest_efer |= host_efer & ignore_bits;
 
-	return true;
+		vmx->guest_msrs[efer_offset].data = guest_efer;
+		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+		return true;
+	}
 }
 
 static unsigned long segment_base(u16 selector)

  parent reply	other threads:[~2016-03-14 18:01 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-14 17:50 [PATCH 4.4 00/50] 4.4.6-stable review Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 01/50] arm64: account for sparsemem section alignment when choosing vmemmap offset Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 02/50] ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 03/50] ARM: dts: dra7: do not gate cpsw clock due to errata i877 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 04/50] ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 06/50] kvm: cap halt polling at exactly halt_poll_ns Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 08/50] KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 09/50] KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit Greg Kroah-Hartman
2016-03-14 17:50 ` Greg Kroah-Hartman [this message]
2016-03-14 17:50 ` [PATCH 4.4 11/50] KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 13/50] s390/dasd: fix diag 0x250 inline assembly Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 14/50] tracing: Fix check for cpu online when event is disabled Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 15/50] dmaengine: at_xdmac: fix residue computation Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 16/50] jffs2: reduce the breakage on recovery from halfway failed rename() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 17/50] ncpfs: fix a braino in OOM handling in ncp_fill_cache() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 18/50] ASoC: dapm: Fix ctl value accesses in a wrong type Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 19/50] ASoC: samsung: Use IRQ safe spin lock calls Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 20/50] ASoC: wm8994: Fix enum ctl accesses in a wrong type Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 21/50] ASoC: wm8958: " Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 22/50] ovl: ignore lower entries when checking purity of non-directory entries Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 23/50] ovl: fix working on distributed fs as lower layer Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 24/50] wext: fix message delay/ordering Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 25/50] cfg80211/wext: fix message ordering Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 26/50] can: gs_usb: fixed disconnect bug by removing erroneous use of kfree() Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 27/50] iwlwifi: mvm: inc pending frames counter also when txing non-sta Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 28/50] mac80211: minstrel: Change expected throughput unit back to Kbps Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 29/50] mac80211: fix use of uninitialised values in RX aggregation Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 30/50] mac80211: minstrel_ht: set default tx aggregation timeout to 0 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 32/50] mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 33/50] mac80211: Fix Public Action frame RX in AP mode Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 34/50] gpu: ipu-v3: Do not bail out on missing optional port nodes Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 35/50] x86/mm: Fix slow_virt_to_phys() for X86_PAE again Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 39/50] Revert "drm/radeon/pm: adjust display configuration after powerstate" Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 40/50] powerpc: Fix dedotify for binutils >= 2.26 Greg Kroah-Hartman
2016-03-14 17:50 ` [PATCH 4.4 41/50] powerpc/powernv: Add a kmsg_dumper that flushes console output on panic Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 42/50] powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 43/50] userfaultfd: dont block on the last VM updates at exit time Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 44/50] ovl: copy new uid/gid into overlayfs runtime inode Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 45/50] ovl: fix getcwd() failure after unsuccessful rmdir Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 46/50] MIPS: Fix build error when SMP is used without GIC Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 47/50] MIPS: smp.c: Fix uninitialised temp_foreign_map Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 48/50] block: dont optimize for non-cloned bio in bio_get_last_bvec() Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 49/50] target: Drop incorrect ABORT_TASK put for completed commands Greg Kroah-Hartman
2016-03-14 17:51 ` [PATCH 4.4 50/50] ld-version: Fix awk regex compile failure Greg Kroah-Hartman
2016-03-14 23:12 ` [PATCH 4.4 00/50] 4.4.6-stable review Shuah Khan
2016-03-16 15:40   ` Greg Kroah-Hartman
2016-03-15  2:34 ` Guenter Roeck
2016-03-16 15:41   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160314175014.873643926@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox