From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
Borislav Petkov <bp@alien8.de>, Borislav Petkov <bpetkov@suse.de>,
Brian Gerst <brgerst@gmail.com>,
Chang Seok <chang.seok.bae@intel.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.12 48/52] x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
Date: Mon, 18 Sep 2017 11:11:42 +0200 [thread overview]
Message-ID: <20170918091023.857485102@linuxfoundation.org> (raw)
In-Reply-To: <20170918091016.620101134@linuxfoundation.org>
4.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski <luto@kernel.org>
commit e137a4d8f4dd2e277e355495b6b2cb241a8693c3 upstream.
Switching FS and GS is a mess, and the current code is still subtly
wrong: it assumes that "Loading a nonzero value into FS sets the
index and base", which is false on AMD CPUs if the value being
loaded is 1, 2, or 3.
(The current code came from commit 3e2b68d752c9 ("x86/asm,
sched/x86: Rewrite the FS and GS context switch code"), which made
it better but didn't fully fix it.)
Rewrite it to be much simpler and more obviously correct. This
should fix it fully on AMD CPUs and shouldn't adversely affect
performance.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kernel/process_64.c | 227 +++++++++++++++++++++++--------------------
1 file changed, 122 insertions(+), 105 deletions(-)
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -149,6 +149,123 @@ void release_thread(struct task_struct *
}
}
+enum which_selector {
+ FS,
+ GS
+};
+
+/*
+ * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
+ * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
+ * It's forcibly inlined because it'll generate better code and this function
+ * is hot.
+ */
+static __always_inline void save_base_legacy(struct task_struct *prev_p,
+ unsigned short selector,
+ enum which_selector which)
+{
+ if (likely(selector == 0)) {
+ /*
+ * On Intel (without X86_BUG_NULL_SEG), the segment base could
+ * be the pre-existing saved base or it could be zero. On AMD
+ * (with X86_BUG_NULL_SEG), the segment base could be almost
+ * anything.
+ *
+ * This branch is very hot (it's hit twice on almost every
+ * context switch between 64-bit programs), and avoiding
+ * the RDMSR helps a lot, so we just assume that whatever
+ * value is already saved is correct. This matches historical
+ * Linux behavior, so it won't break existing applications.
+ *
+ * To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
+ * report that the base is zero, it needs to actually be zero:
+ * see the corresponding logic in load_seg_legacy.
+ */
+ } else {
+ /*
+ * If the selector is 1, 2, or 3, then the base is zero on
+ * !X86_BUG_NULL_SEG CPUs and could be anything on
+ * X86_BUG_NULL_SEG CPUs. In the latter case, Linux
+ * has never attempted to preserve the base across context
+ * switches.
+ *
+ * If selector > 3, then it refers to a real segment, and
+ * saving the base isn't necessary.
+ */
+ if (which == FS)
+ prev_p->thread.fsbase = 0;
+ else
+ prev_p->thread.gsbase = 0;
+ }
+}
+
+static __always_inline void save_fsgs(struct task_struct *task)
+{
+ savesegment(fs, task->thread.fsindex);
+ savesegment(gs, task->thread.gsindex);
+ save_base_legacy(task, task->thread.fsindex, FS);
+ save_base_legacy(task, task->thread.gsindex, GS);
+}
+
+static __always_inline void loadseg(enum which_selector which,
+ unsigned short sel)
+{
+ if (which == FS)
+ loadsegment(fs, sel);
+ else
+ load_gs_index(sel);
+}
+
+static __always_inline void load_seg_legacy(unsigned short prev_index,
+ unsigned long prev_base,
+ unsigned short next_index,
+ unsigned long next_base,
+ enum which_selector which)
+{
+ if (likely(next_index <= 3)) {
+ /*
+ * The next task is using 64-bit TLS, is not using this
+ * segment at all, or is having fun with arcane CPU features.
+ */
+ if (next_base == 0) {
+ /*
+ * Nasty case: on AMD CPUs, we need to forcibly zero
+ * the base.
+ */
+ if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
+ loadseg(which, __USER_DS);
+ loadseg(which, next_index);
+ } else {
+ /*
+ * We could try to exhaustively detect cases
+ * under which we can skip the segment load,
+ * but there's really only one case that matters
+ * for performance: if both the previous and
+ * next states are fully zeroed, we can skip
+ * the load.
+ *
+ * (This assumes that prev_base == 0 has no
+ * false positives. This is the case on
+ * Intel-style CPUs.)
+ */
+ if (likely(prev_index | next_index | prev_base))
+ loadseg(which, next_index);
+ }
+ } else {
+ if (prev_index != next_index)
+ loadseg(which, next_index);
+ wrmsrl(which == FS ? MSR_FS_BASE : MSR_KERNEL_GS_BASE,
+ next_base);
+ }
+ } else {
+ /*
+ * The next task is using a real segment. Loading the selector
+ * is sufficient.
+ */
+ loadseg(which, next_index);
+ }
+}
+
int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
{
@@ -286,7 +403,6 @@ __switch_to(struct task_struct *prev_p,
struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
- unsigned prev_fsindex, prev_gsindex;
switch_fpu_prepare(prev_fpu, cpu);
@@ -295,8 +411,7 @@ __switch_to(struct task_struct *prev_p,
*
* (e.g. xen_load_tls())
*/
- savesegment(fs, prev_fsindex);
- savesegment(gs, prev_gsindex);
+ save_fsgs(prev_p);
/*
* Load TLS before restoring any segments so that segment loads
@@ -335,108 +450,10 @@ __switch_to(struct task_struct *prev_p,
if (unlikely(next->ds | prev->ds))
loadsegment(ds, next->ds);
- /*
- * Switch FS and GS.
- *
- * These are even more complicated than DS and ES: they have
- * 64-bit bases are that controlled by arch_prctl. The bases
- * don't necessarily match the selectors, as user code can do
- * any number of things to cause them to be inconsistent.
- *
- * We don't promise to preserve the bases if the selectors are
- * nonzero. We also don't promise to preserve the base if the
- * selector is zero and the base doesn't match whatever was
- * most recently passed to ARCH_SET_FS/GS. (If/when the
- * FSGSBASE instructions are enabled, we'll need to offer
- * stronger guarantees.)
- *
- * As an invariant,
- * (fsbase != 0 && fsindex != 0) || (gsbase != 0 && gsindex != 0) is
- * impossible.
- */
- if (next->fsindex) {
- /* Loading a nonzero value into FS sets the index and base. */
- loadsegment(fs, next->fsindex);
- } else {
- if (next->fsbase) {
- /* Next index is zero but next base is nonzero. */
- if (prev_fsindex)
- loadsegment(fs, 0);
- wrmsrl(MSR_FS_BASE, next->fsbase);
- } else {
- /* Next base and index are both zero. */
- if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
- /*
- * We don't know the previous base and can't
- * find out without RDMSR. Forcibly clear it.
- */
- loadsegment(fs, __USER_DS);
- loadsegment(fs, 0);
- } else {
- /*
- * If the previous index is zero and ARCH_SET_FS
- * didn't change the base, then the base is
- * also zero and we don't need to do anything.
- */
- if (prev->fsbase || prev_fsindex)
- loadsegment(fs, 0);
- }
- }
- }
- /*
- * Save the old state and preserve the invariant.
- * NB: if prev_fsindex == 0, then we can't reliably learn the base
- * without RDMSR because Intel user code can zero it without telling
- * us and AMD user code can program any 32-bit value without telling
- * us.
- */
- if (prev_fsindex)
- prev->fsbase = 0;
- prev->fsindex = prev_fsindex;
-
- if (next->gsindex) {
- /* Loading a nonzero value into GS sets the index and base. */
- load_gs_index(next->gsindex);
- } else {
- if (next->gsbase) {
- /* Next index is zero but next base is nonzero. */
- if (prev_gsindex)
- load_gs_index(0);
- wrmsrl(MSR_KERNEL_GS_BASE, next->gsbase);
- } else {
- /* Next base and index are both zero. */
- if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
- /*
- * We don't know the previous base and can't
- * find out without RDMSR. Forcibly clear it.
- *
- * This contains a pointless SWAPGS pair.
- * Fixing it would involve an explicit check
- * for Xen or a new pvop.
- */
- load_gs_index(__USER_DS);
- load_gs_index(0);
- } else {
- /*
- * If the previous index is zero and ARCH_SET_GS
- * didn't change the base, then the base is
- * also zero and we don't need to do anything.
- */
- if (prev->gsbase || prev_gsindex)
- load_gs_index(0);
- }
- }
- }
- /*
- * Save the old state and preserve the invariant.
- * NB: if prev_gsindex == 0, then we can't reliably learn the base
- * without RDMSR because Intel user code can zero it without telling
- * us and AMD user code can program any 32-bit value without telling
- * us.
- */
- if (prev_gsindex)
- prev->gsbase = 0;
- prev->gsindex = prev_gsindex;
+ load_seg_legacy(prev->fsindex, prev->fsbase,
+ next->fsindex, next->fsbase, FS);
+ load_seg_legacy(prev->gsindex, prev->gsbase,
+ next->gsindex, next->gsbase, GS);
switch_fpu_finish(next_fpu, cpu);
next prev parent reply other threads:[~2017-09-18 9:14 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-18 9:10 [PATCH 4.12 00/52] 4.12.14-stable review Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.12 01/52] ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt() Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.12 02/52] ipv6: add rcu grace period before freeing fib6_node Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.12 03/52] ipv6: fix sparse warning on rt6i_node Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.12 04/52] macsec: add genl family module alias Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.12 05/52] udp: on peeking bad csum, drop packets even if not at head Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 06/52] bpf: fix map value attribute for hash of maps Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 07/52] fsl/man: Inherit parent device and of_node Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 08/52] sctp: Avoid out-of-bounds reads from address storage Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 09/52] qlge: avoid memcpy buffer overflow Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 10/52] tipc: Fix tipc_sk_reinit handling of -EAGAIN Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 11/52] net: systemport: Be drop monitor friendly Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 12/52] net: bcmgenet: " Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 13/52] net: systemport: Free DMA coherent descriptors on errors Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 14/52] netvsc: fix deadlock betwen link status and removal Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 15/52] udp6: set rx_dst_cookie on rx_dst updates Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 16/52] net: mvpp2: fix the mac address used when using PPv2.2 Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 17/52] cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox() Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 18/52] ipv6: set dst.obsolete when a cached route has expired Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 19/52] ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 20/52] packet: Dont write vnet header beyond end of buffer Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 21/52] kcm: do not attach PF_KCM sockets to avoid deadlock Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 22/52] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 23/52] net/mlx5e: Check for qos capability in dcbnl_initialize Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 24/52] net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 25/52] net/mlx5: Fix arm SRQ command for ISSI version 0 Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 26/52] net/mlx5e: Fix dangling page pointer on DMA mapping error Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 27/52] net/mlx5e: Dont override user RSS upon set channels Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 28/52] net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 29/52] net/mlx5: E-Switch, Unload the representors in the correct order Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 30/52] net/mlx5e: Fix inline header size for small packets Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 31/52] net/mlx5e: Fix CQ moderation mode not set properly Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 32/52] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 33/52] net: fec: Allow reception of frames bigger than 1522 bytes Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 34/52] mlxsw: spectrum: Forbid linking to devices that have uppers Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 35/52] bridge: switchdev: Clear forward mark when transmitting packet Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 36/52] Revert "net: use lib/percpu_counter API for fragmentation mem accounting" Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 37/52] Revert "net: fix percpu memory leaks" Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 38/52] gianfar: Fix Tx flow control deactivation Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 39/52] vhost_net: correctly check tx avail during rx busy polling Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 40/52] ip6_gre: update mtu properly in ip6gre_err Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 41/52] ipv6: fix memory leak with multiple tables during netns destruction Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 42/52] ipv6: fix typo in fib6_net_exit() Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 43/52] sctp: fix missing wake ups in some situations Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 44/52] f2fs: let fill_super handle roll-forward errors Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 45/52] f2fs: check hot_data for roll-forward recovery Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 46/52] x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 47/52] x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps Greg Kroah-Hartman
2017-09-18 9:11 ` Greg Kroah-Hartman [this message]
2017-09-18 9:11 ` [PATCH 4.12 49/52] fuse: allow server to run in different pid_ns Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 50/52] idr: remove WARN_ON_ONCE() when trying to replace negative ID Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 51/52] md/raid1/10: reset bio allocated from mempool Greg Kroah-Hartman
2017-09-18 9:11 ` [PATCH 4.12 52/52] md/raid5: release/flush io in raid5_do_work() Greg Kroah-Hartman
2017-09-18 14:22 ` [PATCH 4.12 00/52] 4.12.14-stable review Sudip Mukherjee
2017-09-19 6:34 ` Greg Kroah-Hartman
2017-09-20 12:15 ` Sudip Mukherjee
2017-09-18 19:28 ` Guenter Roeck
2017-09-18 20:14 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170918091023.857485102@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@alien8.de \
--cc=bpetkov@suse.de \
--cc=brgerst@gmail.com \
--cc=chang.seok.bae@intel.com \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).