From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
Borislav Petkov <bp@suse.de>, Brian Gerst <brgerst@gmail.com>,
Dave Hansen <dave.hansen@intel.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
"Elliott, Robert (Persistent Memory)" <elliott@hpe.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.13 20/52] x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
Date: Mon, 18 Sep 2017 11:09:48 +0200 [thread overview]
Message-ID: <20170918090907.041255824@linuxfoundation.org> (raw)
In-Reply-To: <20170918090904.072766209@linuxfoundation.org>
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Luck <tony.luck@intel.com>
commit ce0fa3e56ad20f04d8252353dcd24e924abdafca upstream.
Speculative processor accesses may reference any memory that has a
valid page table entry. While a speculative access won't generate
a machine check, it will log the error in a machine check bank. That
could cause escalation of a subsequent error since the overflow bit
will be then set in the machine check bank status register.
Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
address of the page we want to map out otherwise we may trigger the
very problem we are trying to avoid. We use a non-canonical address
that passes through the usual Linux table walking code to get to the
same "pte".
Thanks to Dave Hansen for reviewing several iterations of this.
Also see:
http://marc.info/?l=linux-mm&m=149860136413338&w=2
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/page_64.h | 4 +++
arch/x86/kernel/cpu/mcheck/mce.c | 43 +++++++++++++++++++++++++++++++++++++++
include/linux/mm_inline.h | 6 +++++
mm/memory-failure.c | 2 +
4 files changed, 55 insertions(+)
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -51,6 +51,10 @@ static inline void clear_page(void *page
void copy_page(void *to, void *from);
+#ifdef CONFIG_X86_MCE
+#define arch_unmap_kpfn arch_unmap_kpfn
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_X86_VSYSCALL_EMULATION
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -51,6 +51,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/reboot.h>
+#include <asm/set_memory.h>
#include "mce-internal.h"
@@ -1051,6 +1052,48 @@ static int do_memory_failure(struct mce
return ret;
}
+#if defined(arch_unmap_kpfn) && defined(CONFIG_MEMORY_FAILURE)
+
+void arch_unmap_kpfn(unsigned long pfn)
+{
+ unsigned long decoy_addr;
+
+ /*
+ * Unmap this page from the kernel 1:1 mappings to make sure
+ * we don't log more errors because of speculative access to
+ * the page.
+ * We would like to just call:
+ * set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1);
+ * but doing that would radically increase the odds of a
+ * speculative access to the posion page because we'd have
+ * the virtual address of the kernel 1:1 mapping sitting
+ * around in registers.
+ * Instead we get tricky. We create a non-canonical address
+ * that looks just like the one we want, but has bit 63 flipped.
+ * This relies on set_memory_np() not checking whether we passed
+ * a legal address.
+ */
+
+/*
+ * Build time check to see if we have a spare virtual bit. Don't want
+ * to leave this until run time because most developers don't have a
+ * system that can exercise this code path. This will only become a
+ * problem if/when we move beyond 5-level page tables.
+ *
+ * Hard code "9" here because cpp doesn't grok ilog2(PTRS_PER_PGD)
+ */
+#if PGDIR_SHIFT + 9 < 63
+ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
+#else
+#error "no unused virtual bit available"
+#endif
+
+ if (set_memory_np(decoy_addr, 1))
+ pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
+
+}
+#endif
+
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -126,4 +126,10 @@ static __always_inline enum lru_list pag
#define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
+#ifdef arch_unmap_kpfn
+extern void arch_unmap_kpfn(unsigned long pfn);
+#else
+static __always_inline void arch_unmap_kpfn(unsigned long pfn) { }
+#endif
+
#endif
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1146,6 +1146,8 @@ int memory_failure(unsigned long pfn, in
return 0;
}
+ arch_unmap_kpfn(pfn);
+
orig_head = hpage = compound_head(p);
num_poisoned_pages_inc();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
Borislav Petkov <bp@suse.de>, Brian Gerst <brgerst@gmail.com>,
Dave Hansen <dave.hansen@intel.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
"Elliott, Robert (Persistent Memory)" <elliott@hpe.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.13 20/52] x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
Date: Mon, 18 Sep 2017 11:09:48 +0200 [thread overview]
Message-ID: <20170918090907.041255824@linuxfoundation.org> (raw)
In-Reply-To: <20170918090904.072766209@linuxfoundation.org>
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Luck <tony.luck@intel.com>
commit ce0fa3e56ad20f04d8252353dcd24e924abdafca upstream.
Speculative processor accesses may reference any memory that has a
valid page table entry. While a speculative access won't generate
a machine check, it will log the error in a machine check bank. That
could cause escalation of a subsequent error since the overflow bit
will be then set in the machine check bank status register.
Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
address of the page we want to map out otherwise we may trigger the
very problem we are trying to avoid. We use a non-canonical address
that passes through the usual Linux table walking code to get to the
same "pte".
Thanks to Dave Hansen for reviewing several iterations of this.
Also see:
http://marc.info/?l=linux-mm&m=149860136413338&w=2
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/include/asm/page_64.h | 4 +++
arch/x86/kernel/cpu/mcheck/mce.c | 43 +++++++++++++++++++++++++++++++++++++++
include/linux/mm_inline.h | 6 +++++
mm/memory-failure.c | 2 +
4 files changed, 55 insertions(+)
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -51,6 +51,10 @@ static inline void clear_page(void *page
void copy_page(void *to, void *from);
+#ifdef CONFIG_X86_MCE
+#define arch_unmap_kpfn arch_unmap_kpfn
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_X86_VSYSCALL_EMULATION
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -51,6 +51,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/reboot.h>
+#include <asm/set_memory.h>
#include "mce-internal.h"
@@ -1051,6 +1052,48 @@ static int do_memory_failure(struct mce
return ret;
}
+#if defined(arch_unmap_kpfn) && defined(CONFIG_MEMORY_FAILURE)
+
+void arch_unmap_kpfn(unsigned long pfn)
+{
+ unsigned long decoy_addr;
+
+ /*
+ * Unmap this page from the kernel 1:1 mappings to make sure
+ * we don't log more errors because of speculative access to
+ * the page.
+ * We would like to just call:
+ * set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1);
+ * but doing that would radically increase the odds of a
+ * speculative access to the posion page because we'd have
+ * the virtual address of the kernel 1:1 mapping sitting
+ * around in registers.
+ * Instead we get tricky. We create a non-canonical address
+ * that looks just like the one we want, but has bit 63 flipped.
+ * This relies on set_memory_np() not checking whether we passed
+ * a legal address.
+ */
+
+/*
+ * Build time check to see if we have a spare virtual bit. Don't want
+ * to leave this until run time because most developers don't have a
+ * system that can exercise this code path. This will only become a
+ * problem if/when we move beyond 5-level page tables.
+ *
+ * Hard code "9" here because cpp doesn't grok ilog2(PTRS_PER_PGD)
+ */
+#if PGDIR_SHIFT + 9 < 63
+ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
+#else
+#error "no unused virtual bit available"
+#endif
+
+ if (set_memory_np(decoy_addr, 1))
+ pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
+
+}
+#endif
+
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -126,4 +126,10 @@ static __always_inline enum lru_list pag
#define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
+#ifdef arch_unmap_kpfn
+extern void arch_unmap_kpfn(unsigned long pfn);
+#else
+static __always_inline void arch_unmap_kpfn(unsigned long pfn) { }
+#endif
+
#endif
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1146,6 +1146,8 @@ int memory_failure(unsigned long pfn, in
return 0;
}
+ arch_unmap_kpfn(pfn);
+
orig_head = hpage = compound_head(p);
num_poisoned_pages_inc();
next prev parent reply other threads:[~2017-09-18 9:19 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-18 9:09 [PATCH 4.13 00/52] 4.13.3-stable review Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 01/52] Revert "net: use lib/percpu_counter API for fragmentation mem accounting" Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 02/52] Revert "net: fix percpu memory leaks" Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 03/52] gianfar: Fix Tx flow control deactivation Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 04/52] vhost_net: correctly check tx avail during rx busy polling Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 05/52] ip6_gre: update mtu properly in ip6gre_err Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 06/52] udp: drop head states only when all skb references are gone Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 07/52] ipv6: fix memory leak with multiple tables during netns destruction Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 08/52] ipv6: fix typo in fib6_net_exit() Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 09/52] sctp: fix missing wake ups in some situations Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 10/52] tcp: fix a request socket leak Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 11/52] ip_tunnel: fix setting ttl and tos value in collect_md mode Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 12/52] f2fs: let fill_super handle roll-forward errors Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 13/52] f2fs: check hot_data for roll-forward recovery Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 14/52] thunderbolt: Remove superfluous check Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 15/52] thunderbolt: Make key root-only accessible Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 16/52] thunderbolt: Allow clearing the key Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 17/52] x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 18/52] x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 19/52] x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs Greg Kroah-Hartman
2017-09-18 9:09 ` Greg Kroah-Hartman [this message]
2017-09-18 9:09 ` [PATCH 4.13 20/52] x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 21/52] ovl: fix false positive ESTALE on lookup Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 22/52] fuse: allow server to run in different pid_ns Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 23/52] idr: remove WARN_ON_ONCE() when trying to replace negative ID Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 24/52] libnvdimm, btt: check memory allocation failure Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 25/52] libnvdimm: fix integer overflow static analysis warning Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 26/52] xfs: write unmount record for ro mounts Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 27/52] xfs: toggle readonly state around xfs_log_mount_finish Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 28/52] xfs: Add infrastructure needed for error propagation during buffer IO failure Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 29/52] xfs: Properly retry failed inode items in case of error during buffer writeback Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 30/52] xfs: fix recovery failure when log record header wraps log end Greg Kroah-Hartman
2017-09-18 9:09 ` [PATCH 4.13 31/52] xfs: always verify the log tail during recovery Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 32/52] xfs: fix log recovery corruption error due to tail overwrite Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 33/52] xfs: handle -EFSCORRUPTED during head/tail verification Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 34/52] xfs: stop searching for free slots in an inode chunk when there are none Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 35/52] xfs: evict all inodes involved with log redo item Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 36/52] xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster() Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 37/52] xfs: open-code xfs_buf_item_dirty() Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 38/52] xfs: remove unnecessary dirty bli format check for ordered bufs Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 39/52] xfs: ordered buffer log items are never formatted Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 40/52] xfs: refactor buffer logging into buffer dirtying helper Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 41/52] xfs: dont log dirty ranges for ordered buffers Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 42/52] xfs: skip bmbt block ino validation during owner change Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 43/52] xfs: move bmbt owner change to last step of extent swap Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 44/52] xfs: disallow marking previously dirty buffers as ordered Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 45/52] xfs: relog dirty buffers during swapext bmbt owner change Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 46/52] xfs: disable per-inode DAX flag Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 47/52] xfs: fix incorrect log_flushed on fsync Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 48/52] xfs: dont set v3 xflags for v2 inodes Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 49/52] xfs: open code end_buffer_async_write in xfs_finish_page_writeback Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 50/52] xfs: use kmem_free to free return value of kmem_zalloc Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 51/52] md/raid1/10: reset bio allocated from mempool Greg Kroah-Hartman
2017-09-18 9:10 ` [PATCH 4.13 52/52] md/raid5: release/flush io in raid5_do_work() Greg Kroah-Hartman
2017-09-18 19:29 ` [PATCH 4.13 00/52] 4.13.3-stable review Guenter Roeck
2017-09-18 20:17 ` Shuah Khan
2017-09-19 6:33 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170918090907.041255824@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=bp@suse.de \
--cc=brgerst@gmail.com \
--cc=dave.hansen@intel.com \
--cc=dvlasenk@redhat.com \
--cc=elliott@hpe.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.