public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Kamal Mostafa <kamal@canonical.com>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	kernel-team@lists.ubuntu.com
Cc: Andy Lutomirski <luto@amacapital.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Kamal Mostafa <kamal@canonical.com>
Subject: [PATCH 4.2.y-ckt 81/98] KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Date: Tue, 15 Mar 2016 16:31:15 -0700	[thread overview]
Message-ID: <1458084692-23100-82-git-send-email-kamal@canonical.com> (raw)
In-Reply-To: <1458084692-23100-1-git-send-email-kamal@canonical.com>

4.2.8-ckt6 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Paolo Bonzini <pbonzini@redhat.com>

commit 844a5fe219cf472060315971e15cbf97674a3324 upstream.

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0.  Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution.  This will still cause a user write to fault, while
supervisor writes will succeed.  User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0).  User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.  (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 Documentation/virtual/kvm/mmu.txt |  3 ++-
 arch/x86/kvm/vmx.c                | 36 +++++++++++++++++++++++-------------
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..b653641 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cb450d8..abf8cc7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1718,26 +1718,31 @@ static void reload_tss(void)
 
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
-	u64 guest_efer;
-	u64 ignore_bits;
+	u64 guest_efer = vmx->vcpu.arch.efer;
+	u64 ignore_bits = 0;
 
-	guest_efer = vmx->vcpu.arch.efer;
+	if (!enable_ept) {
+		/*
+		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
+		 * host CPUID is more efficient than testing guest CPUID
+		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SMEP))
+			guest_efer |= EFER_NX;
+		else if (!(guest_efer & EFER_NX))
+			ignore_bits |= EFER_NX;
+	}
 
 	/*
-	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
-	 * outside long mode
+	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
 	 */
-	ignore_bits = EFER_NX | EFER_SCE;
+	ignore_bits |= EFER_SCE;
 #ifdef CONFIG_X86_64
 	ignore_bits |= EFER_LMA | EFER_LME;
 	/* SCE is meaningful only in long mode on Intel */
 	if (guest_efer & EFER_LMA)
 		ignore_bits &= ~(u64)EFER_SCE;
 #endif
-	guest_efer &= ~ignore_bits;
-	guest_efer |= host_efer & ignore_bits;
-	vmx->guest_msrs[efer_offset].data = guest_efer;
-	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
 
 	clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1748,16 +1753,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	 */
 	if (cpu_has_load_ia32_efer ||
 	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-		guest_efer = vmx->vcpu.arch.efer;
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;
 		if (guest_efer != host_efer)
 			add_atomic_switch_msr(vmx, MSR_EFER,
 					      guest_efer, host_efer);
 		return false;
-	}
+	} else {
+		guest_efer &= ~ignore_bits;
+		guest_efer |= host_efer & ignore_bits;
 
-	return true;
+		vmx->guest_msrs[efer_offset].data = guest_efer;
+		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+		return true;
+	}
 }
 
 static unsigned long segment_base(u16 selector)
-- 
2.7.0


  parent reply	other threads:[~2016-03-15 23:33 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-15 23:29 [4.2.y-ckt stable] Linux 4.2.8-ckt6 stable review Kamal Mostafa
2016-03-15 23:29 ` [PATCH 4.2.y-ckt 01/98] tipc: fix connection abort during subscription cancel Kamal Mostafa
2016-03-15 23:29 ` [PATCH 4.2.y-ckt 02/98] tipc: fix nullptr crash " Kamal Mostafa
2016-03-15 23:29 ` [PATCH 4.2.y-ckt 03/98] s390/mm: four page table levels vs. fork Kamal Mostafa
2016-03-15 23:29 ` [PATCH 4.2.y-ckt 04/98] Input: aiptek - fix crash on detecting device without endpoints Kamal Mostafa
2016-03-15 23:29 ` [PATCH 4.2.y-ckt 05/98] wext: fix message delay/ordering Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 06/98] cfg80211/wext: fix message ordering Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 07/98] mac80211: fix use of uninitialised values in RX aggregation Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 08/98] mac80211: minstrel: Change expected throughput unit back to Kbps Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 09/98] libata: fix HDIO_GET_32BIT ioctl Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 10/98] iwlwifi: mvm: inc pending frames counter also when txing non-sta Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 11/98] [media] adv7604: fix tx 5v detect regression Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 12/98] ahci: add new Intel device IDs Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 13/98] ahci: Order SATA device IDs for codename Lewisburg Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 14/98] Adding Intel Lewisburg device IDs for SATA Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 15/98] ASoC: samsung: Use IRQ safe spin lock calls Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 16/98] mac80211: minstrel_ht: set default tx aggregation timeout to 0 Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 17/98] usb: chipidea: otg: change workqueue ci_otg as freezable Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 18/98] Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin" Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 19/98] jffs2: Fix page lock / f->sem deadlock Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 20/98] Fix directory hardlinks from deleted directories Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 21/98] iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 22/98] iommu/amd: Apply workaround for ATS write permission check Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 23/98] libata: Align ata_device's id on a cacheline Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 24/98] can: gs_usb: fixed disconnect bug by removing erroneous use of kfree() Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 25/98] fbcon: set a default value to blink interval Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 26/98] KVM: x86: fix root cause for missed hardware breakpoints Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 27/98] arm64: vmemmap: use virtual projection of linear region Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 28/98] vfio: fix ioctl error handling Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 29/98] ALSA: ctl: Fix ioctls for X32 ABI Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 30/98] ALSA: pcm: " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 31/98] ALSA: rawmidi: Fix ioctls " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 32/98] ALSA: timer: Fix broken compat timer user status ioctl Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 33/98] ALSA: timer: Fix ioctls for X32 ABI Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 34/98] cifs: fix out-of-bounds access in lease parsing Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 35/98] CIFS: Fix SMB2+ interim response processing for read requests Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 36/98] Fix cifs_uniqueid_to_ino_t() function for s390x Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 37/98] arm/arm64: KVM: Fix ioctl error handling Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 38/98] MIPS: kvm: " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 39/98] ALSA: hdspm: Fix wrong boolean ctl value accesses Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 40/98] ALSA: hdspm: Fix zero-division Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 41/98] ALSA: hdsp: Fix wrong boolean ctl value accesses Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 42/98] use ->d_seq to get coherency between ->d_inode and ->d_flags Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 43/98] USB: qcserial: add Dell Wireless 5809e Gobi 4G HSPA+ (rev3) Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 44/98] USB: cp210x: Add ID for Parrot NMEA GPS Flight Recorder Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 45/98] ASoC: dapm: Fix ctl value accesses in a wrong type Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 46/98] ASoC: wm8958: Fix enum ctl " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 47/98] ASoC: wm8994: " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 48/98] ASoC: wm_adsp: " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 49/98] USB: serial: option: add support for Telit LE922 PID 0x1045 Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 50/98] USB: serial: option: add support for Quectel UC20 Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 51/98] ALSA: usb-audio: Add a quirk for Plantronics DA45 Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 52/98] mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 53/98] mac80211: Fix Public Action frame RX in AP mode Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 54/98] i2c: brcmstb: allocate correct amount of memory for regmap Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 55/98] ALSA: seq: oss: Don't drain at closing a client Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 56/98] parisc: Fix ptrace syscall number and return value modification Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 57/98] drm/ast: Fix incorrect register check for DRAM width Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 58/98] USB: qcserial: add Sierra Wireless EM74xx device ID Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 59/98] drm/amdgpu/pm: update current crtc info after setting the powerstate Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 60/98] drm/radeon/pm: " Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 61/98] drm/amdgpu: return from atombios_dp_get_dpcd only when error Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 62/98] PM / sleep / x86: Fix crash on graph trace through x86 suspend Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 63/98] ALSA: hda - Fix mic issues on Acer Aspire E1-472 Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 64/98] ovl: fix working on distributed fs as lower layer Kamal Mostafa
2016-03-15 23:30 ` [PATCH 4.2.y-ckt 65/98] ovl: fix getcwd() failure after unsuccessful rmdir Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 66/98] ovl: ignore lower entries when checking purity of non-directory entries Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 67/98] ovl: copy new uid/gid into overlayfs runtime inode Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 68/98] MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp' Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 69/98] ubi: Fix out of bounds write in volume update code Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 70/98] target: Drop incorrect ABORT_TASK put for completed commands Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 71/98] ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 72/98] ARM: dts: dra7: do not gate cpsw clock due to errata i877 Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 73/98] PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr() Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 74/98] Revert "drm/radeon: call hpd_irq_event on resume" Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 75/98] KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 76/98] ncpfs: fix a braino in OOM handling in ncp_fill_cache() Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 77/98] jffs2: reduce the breakage on recovery from halfway failed rename() Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 78/98] KVM: VMX: disable PEBS before a guest entry Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 79/98] arm64: account for sparsemem section alignment when choosing vmemmap offset Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 80/98] tracing: Fix check for cpu online when event is disabled Kamal Mostafa
2016-03-15 23:31 ` Kamal Mostafa [this message]
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 82/98] dmaengine: at_xdmac: fix residue computation Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 83/98] MIPS: Fix build error when SMP is used without GIC Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 84/98] IB/core: Use GRH when the path hop-limit > 0 Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 85/98] dmaengine: pxa_dma: fix cyclic transfers Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 86/98] MIPS: smp.c: Fix uninitialised temp_foreign_map Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 87/98] tcp: fix tcpi_segs_in after connection establishment Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 88/98] be2net: Don't leak iomapped memory on removal Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 89/98] tcp: convert cached rtt from usec to jiffies when feeding initial rto Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 90/98] ext4: iterate over buffer heads correctly in move_extent_per_page() Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 91/98] ppp: release rtnl mutex when interface creation fails Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 92/98] net/mlx4_core: Allow resetting VF admin mac to zero Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 93/98] ipv6: re-enable fragment header matching in ipv6_find_hdr Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 94/98] net/mlx5e: Remove wrong poll CQ optimization Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 95/98] cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 96/98] net: qca_spi: Don't clear IFF_BROADCAST Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 97/98] net: moxa: fix an error code Kamal Mostafa
2016-03-15 23:31 ` [PATCH 4.2.y-ckt 98/98] mld, igmp: Fix reserved tailroom calculation Kamal Mostafa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1458084692-23100-82-git-send-email-kamal@canonical.com \
    --to=kamal@canonical.com \
    --cc=kernel-team@lists.ubuntu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox