stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk
Subject: [ 49/73] i387: fix x86-64 preemption-unsafe user stack save/restore
Date: Mon, 27 Feb 2012 17:02:52 -0800	[thread overview]
Message-ID: <20120228010212.369270454@linuxfoundation.org> (raw)
In-Reply-To: <20120228010246.GA24299@kroah.com>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 upstream.

Commit 5b1cbac37798 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid.  With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/i387.h |   42 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c     |    1 -
 arch/x86/kernel/xsave.c     |   10 +++-------
 3 files changed, 45 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -400,6 +400,48 @@ static inline void irq_ts_restore(int TS
 }
 
 /*
+ * The question "does this thread have fpu access?"
+ * is slightly racy, since preemption could come in
+ * and revoke it immediately after the test.
+ *
+ * However, even in that very unlikely scenario,
+ * we can just assume we have FPU access - typically
+ * to save the FP state - we'll just take a #NM
+ * fault and get the FPU access back.
+ *
+ * The actual user_fpu_begin/end() functions
+ * need to be preemption-safe, though.
+ *
+ * NOTE! user_fpu_end() must be used only after you
+ * have saved the FP state, and user_fpu_begin() must
+ * be used only immediately before restoring it.
+ * These functions do not do any save/restore on
+ * their own.
+ */
+static inline int user_has_fpu(void)
+{
+	return current_thread_info()->status & TS_USEDFPU;
+}
+
+static inline void user_fpu_end(void)
+{
+	preempt_disable();
+	current_thread_info()->status &= ~TS_USEDFPU;
+	stts();
+	preempt_enable();
+}
+
+static inline void user_fpu_begin(void)
+{
+	preempt_disable();
+	if (!user_has_fpu()) {
+		clts();
+		current_thread_info()->status |= TS_USEDFPU;
+	}
+	preempt_enable();
+}
+
+/*
  * These disable preemption on their own and are safe
  */
 static inline void save_init_fpu(struct task_struct *tsk)
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -777,7 +777,6 @@ EXPORT_SYMBOL_GPL(math_state_restore);
 dotraplinkage void __kprobes
 do_device_not_available(struct pt_regs *regs, long error_code)
 {
-	WARN_ON_ONCE(!user_mode_vm(regs));
 #ifdef CONFIG_MATH_EMULATION
 	if (read_cr0() & X86_CR0_EM) {
 		struct math_emu_info info = { };
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -168,7 +168,7 @@ int save_i387_xstate(void __user *buf)
 	if (!used_math())
 		return 0;
 
-	if (task_thread_info(tsk)->status & TS_USEDFPU) {
+	if (user_has_fpu()) {
 		if (use_xsave())
 			err = xsave_user(buf);
 		else
@@ -176,8 +176,7 @@ int save_i387_xstate(void __user *buf)
 
 		if (err)
 			return err;
-		task_thread_info(tsk)->status &= ~TS_USEDFPU;
-		stts();
+		user_fpu_end();
 	} else {
 		sanitize_i387_state(tsk);
 		if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
@@ -292,10 +291,7 @@ int restore_i387_xstate(void __user *buf
 			return err;
 	}
 
-	if (!(task_thread_info(current)->status & TS_USEDFPU)) {
-		clts();
-		task_thread_info(current)->status |= TS_USEDFPU;
-	}
+	user_fpu_begin();
 	if (use_xsave())
 		err = restore_user_xstate(buf);
 	else



  parent reply	other threads:[~2012-02-28  1:02 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-28  1:02 [ 00/73] 3.0.23-stable review Greg KH
2012-02-28  1:02 ` [ 01/73] ASoC: wm8962: Fix sidetone enumeration texts Greg KH
2012-02-28  1:02 ` [ 02/73] NOMMU: Lock i_mmap_mutex for access to the VMA prio list Greg KH
2012-02-28  1:02 ` [ 03/73] hwmon: (max6639) Fix FAN_FROM_REG calculation Greg KH
2012-02-28  1:02 ` [ 04/73] hwmon: (max6639) Fix PPR register initialization to set both channels Greg KH
2012-02-28  1:02 ` [ 05/73] hwmon: (ads1015) Fix file leak in probe function Greg KH
2012-02-28  1:02 ` [ 06/73] powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events Greg KH
2012-02-28  1:02 ` [ 07/73] drm/radeon/kms: fix MSI re-arm on rv370+ Greg KH
2012-02-28  1:02 ` [ 08/73] PCI: workaround hard-wired bus number V2 Greg KH
2012-02-28  1:02 ` [ 09/73] mac80211: Fix a rwlock bad magic bug Greg KH
2012-02-28  1:02 ` [ 10/73] ipheth: Add iPhone 4S Greg KH
2012-02-28  1:02 ` [ 11/73] eCryptfs: Copy up lower inode attrs after setting lower xattr Greg KH
2012-02-28  1:02 ` [ 12/73] ALSA: hda - Fix redundant jack creations for cx5051 Greg KH
2012-02-28  1:02 ` [ 13/73] mmc: core: check for zero length ioctl data Greg KH
2012-02-28  1:02 ` [ 14/73] NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID Greg KH
2012-02-28  1:02 ` [ 15/73] ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR Greg KH
2012-02-28  1:02 ` [ 16/73] ARM: 7325/1: fix v7 boot with lockdep enabled Greg KH
2012-02-28  1:02 ` [ 17/73] net: Make qdisc_skb_cb upper size bound explicit Greg KH
2012-02-28  1:02 ` [ 18/73] IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses Greg KH
2012-02-28  1:02 ` [ 19/73] gro: more generic L2 header check Greg KH
2012-02-28  1:02 ` [ 20/73] veth: Enforce minimum size of VETH_INFO_PEER Greg KH
2012-02-28  1:02 ` [ 21/73] 3c59x: shorten timer period for slave devices Greg KH
2012-02-28  1:02 ` [ 22/73] ipv6-multicast: Fix memory leak in input path Greg KH
2012-02-28  1:02 ` [ 23/73] ipv6-multicast: Fix memory leak in IPv6 multicast Greg KH
2012-02-28  1:02 ` [ 24/73] ipv4: fix for ip_options_rcv_srr() daddr update Greg KH
2012-02-28  1:02 ` [ 25/73] ipv4: Save nexthop address of LSRR/SSRR option to IPCB Greg KH
2012-02-28  1:02 ` [ 26/73] ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr Greg KH
2012-02-28  1:02 ` [ 27/73] ipv4: reset flowi parameters on route connect Greg KH
2012-02-28  1:02 ` [ 28/73] net: Dont proxy arp respond if iif == rt->dst.dev if private VLAN is disabled Greg KH
2012-02-28  1:02 ` [ 29/73] netpoll: netpoll_poll_dev() should access dev->flags Greg KH
2012-02-28  1:02 ` [ 30/73] net_sched: Bug in netem reordering Greg KH
2012-02-28  1:02 ` [ 31/73] via-velocity: S3 resume fix Greg KH
2012-02-28  1:02 ` [ 32/73] tcp_v4_send_reset: binding oif to iif in no sock case Greg KH
2012-02-28  1:02 ` [ 33/73] tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbs Greg KH
2012-02-28  1:02 ` [ 34/73] tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one() Greg KH
2012-02-28  1:02 ` [ 35/73] tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACK Greg KH
2012-02-28  1:02 ` [ 36/73] route: fix ICMP redirect validation Greg KH
2012-02-28  1:02 ` [ 37/73] ipv4: fix redirect handling Greg KH
2012-02-28  1:02 ` [ 38/73] USB: Added Kamstrup VID/PIDs to cp210x serial driver Greg KH
2012-02-28  1:02 ` [ 39/73] USB: option: cleanup zte 3g-dongles pid in option.c Greg KH
2012-02-28  1:02 ` [ 40/73] USB: Serial: ti_usb_3410_5052: Add Abbot Diabetes Care cable id Greg KH
2012-02-28  1:02 ` [ 41/73] USB: Remove duplicate USB 3.0 hub feature #defines Greg KH
2012-02-28  1:02 ` [ 42/73] USB: Fix handoff when BIOS disables host PCI device Greg KH
2012-02-28  1:02 ` [ 43/73] xhci: Fix oops caused by more USB2 ports than USB3 ports Greg KH
2012-02-28  1:02 ` [ 44/73] xhci: Fix encoding for HS bulk/control NAK rate Greg KH
2012-02-28  1:02 ` [ 45/73] USB: Set hub depth after USB3 hub reset Greg KH
2012-02-28  1:02 ` [ 46/73] i387: math_state_restore() isnt called from asm Greg KH
2012-02-28  1:02 ` [ 47/73] i387: make irq_fpu_usable() tests more robust Greg KH
2012-02-28  1:02 ` [ 48/73] i387: fix sense of sanity check Greg KH
2012-02-28  1:02 ` Greg KH [this message]
2012-02-28  1:02 ` [ 50/73] i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers Greg KH
2012-02-28  1:02 ` [ 51/73] i387: dont ever touch TS_USEDFPU directly, use helper functions Greg KH
2012-02-28  1:02 ` [ 52/73] i387: do not preload FPU state at task switch time Greg KH
2012-02-28  1:02 ` [ 53/73] i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore Greg KH
2012-02-28  1:02 ` [ 54/73] i387: move TS_USEDFPU flag from thread_info to task_struct Greg KH
2012-02-28  1:02 ` [ 55/73] i387: re-introduce FPU state preloading at context switch time Greg KH
2012-02-28  1:02 ` [ 56/73] usb-storage: fix freezing of the scanning thread Greg KH
2012-02-28  1:03 ` [ 57/73] USB: Dont fail USB3 probe on missing legacy PCI IRQ Greg KH
2012-02-28  1:03 ` [ 58/73] x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors Greg KH
2012-02-28  1:03 ` [ 59/73] ath9k: stop on rates with idx -1 in ath9k rate controls .tx_status Greg KH
2012-02-28  1:03 ` [ 60/73] genirq: Unmask oneshot irqs when thread was not woken Greg KH
2012-02-28  1:03 ` [ 61/73] genirq: Handle pending irqs in irq_startup() Greg KH
2012-02-28  1:03 ` [ 62/73] [SCSI] scsi_scan: Fix Poison overwritten warning caused by using freed shost Greg KH
2012-02-28  1:03 ` [ 63/73] [SCSI] scsi_pm: Fix bug in the SCSI power management handler Greg KH
2012-02-28  1:03 ` [ 64/73] ipvs: fix matching of fwmark templates during scheduling Greg KH
2012-02-28  1:03 ` [ 65/73] jme: Fix FIFO flush issue Greg KH
2012-02-28  1:03 ` [ 66/73] davinci_emac: Do not free all rx dma descriptors during init Greg KH
2012-02-28  1:03 ` [ 67/73] builddeb: Dont create files in /tmp with predictable names Greg KH
2012-02-28  1:03 ` [ 68/73] [media] hdpvr: fix race conditon during start of streaming Greg KH
2012-02-28  1:03 ` [ 69/73] hwmon: (f75375s) Fix register write order when setting fans to full speed Greg KH
2012-02-28  1:03 ` [ 70/73] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() Greg KH
2012-02-28  1:03 ` [ 71/73] epoll: ep_unregister_pollwait() can use the freed pwq->whead Greg KH
2012-02-28  1:03 ` [ 72/73] epoll: limit paths Greg KH
2012-02-28  1:03 ` [ 73/73] cdrom: use copy_to_user() without the underscores Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120228010212.369270454@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).