From mboxrd@z Thu Jan  1 00:00:00 1970
Return-path: <ak@linux.intel.com>
Received: from mga03.intel.com ([134.134.136.65])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <ak@linux.intel.com>)
	id 1fGX2a-00089a-Ho
	for speck@linutronix.de; Wed, 09 May 2018 23:54:29 +0200
Date: Wed, 9 May 2018 14:54:25 -0700
From: Andi Kleen <ak@linux.intel.com>
Subject: [MODERATED] Re: [PATCH v4 0/8] L1TFv4 0
Message-ID: <20180509215425.GA31444@tassilo.jf.intel.com>
References: <cover.1525900921.git.ak@linux.intel.com>
MIME-Version: 1.0
In-Reply-To: <cover.1525900921.git.ak@linux.intel.com>
Content-Type: multipart/mixed; boundary="mP3DRpeJDSE+ciuQ"
Content-Disposition: inline
To: speck@linutronix.de
List-ID: <speck.linutronix.de>


--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline


And here's a mbox for easier review/applying


--mP3DRpeJDSE+ciuQ
Content-Type: application/vnd.wolfram.mathematica.package
Content-Disposition: attachment; filename=m
Content-Transfer-Encoding: quoted-printable

=46rom 0c2eb2235d5476b216693f1e9ec8394d58af20b3 Mon Sep 17 00:00:00 2001=0A=
=46rom: Andi Kleen <ak@linux.intel.com>=0ADate: Thu, 3 May 2018 08:35:42 -0=
700=0ASubject: [PATCH 1/8] x86, l1tf: Increase 32bit PAE __PHYSICAL_PAGE_MA=
SK=0AStatus: RO=0AContent-Length: 1575=0ALines: 43=0A=0AOn 32bit PAE the ma=
x PTE mask is currently set to 44 bit because that is=0Athe limit imposed b=
y 32bit unsigned long PFNs in the VMs.=0A=0AThe L1TF PROT_NONE protection c=
ode uses the PTE masks to determine=0Awhat bits to invert to make sure the =
higher bits are set for unmapped=0Aentries to prevent L1TF speculation atta=
cks against EPT inside guests.=0A=0ABut our inverted mask has to match the =
host, and the host is likely=0A64bit and may use more than 43 bits of memor=
y. We want to set=0Aall possible bits to be safe here.=0A=0ASo increase the=
 mask on 32bit PAE to 52 to match 64bit. The real=0Alimit is still 44 bits =
but outside the inverted PTEs these=0Ahigher bits are set, so a bigger mask=
s don't cause any problems.=0A=0ASigned-off-by: Andi Kleen <ak@linux.intel.=
com>=0A---=0A arch/x86/include/asm/page_32_types.h | 9 +++++++--=0A 1 file =
changed, 7 insertions(+), 2 deletions(-)=0A=0Adiff --git a/arch/x86/include=
/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h=0Aindex aa30c32=
41ea7..0d5c739eebd7 100644=0A--- a/arch/x86/include/asm/page_32_types.h=0A+=
++ b/arch/x86/include/asm/page_32_types.h=0A@@ -29,8 +29,13 @@=0A #define N=
_EXCEPTION_STACKS 1=0A =0A #ifdef CONFIG_X86_PAE=0A-/* 44=3D32+12, the limi=
t we can fit into an unsigned long pfn */=0A-#define __PHYSICAL_MASK_SHIFT	=
44=0A+/*=0A+ * This is beyond the 44 bit limit imposed by the 32bit long pf=
ns,=0A+ * but we need the full mask to make sure inverted PROT_NONE=0A+ * e=
ntries have all the host bits set in a guest.=0A+ * The real limit is still=
 44 bits.=0A+ */=0A+#define __PHYSICAL_MASK_SHIFT	52=0A #define __VIRTUAL_M=
ASK_SHIFT	32=0A =0A #else  /* !CONFIG_X86_PAE */=0A-- =0A2.14.3=0A=0A=0AFro=
m 1bef0e393f925379b76cb689bfb3fdbfc052e716 Mon Sep 17 00:00:00 2001=0AFrom:=
 Linus Torvalds <torvalds@linux-foundation.org>=0ADate: Fri, 27 Apr 2018 09=
:06:34 -0700=0ASubject: [PATCH 2/8] x86, l1tf: Protect swap entries against=
 L1TF=0AStatus: RO=0AContent-Length: 4505=0ALines: 108=0A=0AWith L1 termina=
l fault the CPU speculates into unmapped PTEs, and=0Aresulting side effects=
 allow to read the memory the PTE is pointing=0Atoo, if its values are stil=
l in the L1 cache.=0A=0AFor swapped out pages Linux uses unmapped PTEs and =
stores a swap entry=0Ainto them.=0A=0AWe need to make sure the swap entry i=
s not pointing to valid memory,=0Awhich requires setting higher bits (betwe=
en bit 36 and bit 45) that=0Aare inside the CPUs physical address space, bu=
t outside any real=0Amemory.=0A=0ATo do this we invert the offset to make s=
ure the higher bits are always=0Aset, as long as the swap file is not too b=
ig.=0A=0AHere's a patch that switches the order of "type" and=0A"offset" in=
 the x86-64 encoding, in addition to doing the binary 'not' on=0Athe offset=
=2E=0A=0AThat means that now the offset is bits 9-58 in the page table, and=
 that=0Athe offset is in the bits that hardware generally doesn't care abou=
t.=0A=0AThat, in turn, means that if you have a desktop chip with only 40 b=
its of=0Aphysical addressing, now that the offset starts at bit 9, you stil=
l have=0Ato have 30 bits of offset actually *in use* until bit 39 ends up b=
eing=0Aclear.=0A=0ASo that's 4 terabyte of swap space (because the offset i=
s counted in=0Apages, so 30 bits of offset is 42 bits of actual coverage). =
With bigger=0Aphysical addressing, that obviously grows further, until you =
hit the limit=0Aof the offset (at 50 bits of offset - 62 bits of actual swa=
p file=0Acoverage).=0A=0ANote there is no workaround for 32bit !PAE, or on =
systems which=0Ahave more than MAX_PA/2 memory. The later case is very unli=
kely=0Ato happen on real systems.=0A=0A[updated description and minor tweak=
s by AK]=0A=0ASigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>=
=0ASigned-off-by: Andi Kleen <ak@linux.intel.com>=0ATested-by: Andi Kleen <=
ak@linux.intel.com>=0AAcked-by: Michal Hocko <mhocko@suse.com>=0A---=0A arc=
h/x86/include/asm/pgtable_64.h | 36 +++++++++++++++++++++++++-----------=0A=
 1 file changed, 25 insertions(+), 11 deletions(-)=0A=0Adiff --git a/arch/x=
86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h=0Aindex 877=
bc27718ae..593c3cf259dd 100644=0A--- a/arch/x86/include/asm/pgtable_64.h=0A=
+++ b/arch/x86/include/asm/pgtable_64.h=0A@@ -273,7 +273,7 @@ static inline=
 int pgd_large(pgd_t pgd) { return 0; }=0A  *=0A  * |     ...            | =
11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number=0A  * |     ...            |=
SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names=0A- * | OFFSET (14->63) | TY=
PE (9-13)  |0|0|X|X| X| X|X|SD|0| <- swp entry=0A+ * | TYPE (59-63) | ~OFFS=
ET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry=0A  *=0A  * G (8) is aliased=
 and used as a PROT_NONE indicator for=0A  * !present ptes.  We need to sta=
rt storing swap entries above=0A@@ -286,20 +286,34 @@ static inline int pgd=
_large(pgd_t pgd) { return 0; }=0A  *=0A  * Bit 7 in swp entry should be 0 =
because pmd_present checks not only P,=0A  * but also L and G.=0A+ *=0A+ * =
The offset is inverted by a binary not operation to make the high=0A+ * phy=
sical bits set.=0A  */=0A-#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + =
1)=0A-#define SWP_TYPE_BITS 5=0A-/* Place the offset above the type: */=0A-=
#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)=0A+#defin=
e SWP_TYPE_BITS		5=0A+=0A+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE =
+ 1)=0A+=0A+/* We always extract/encode the offset by shifting it all the w=
ay up, and then down again */=0A+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST=
_BIT+SWP_TYPE_BITS)=0A =0A #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_S=
WAPFILES_SHIFT > SWP_TYPE_BITS)=0A =0A-#define __swp_type(x)			(((x).val >>=
 (SWP_TYPE_FIRST_BIT)) \=0A-					 & ((1U << SWP_TYPE_BITS) - 1))=0A-#define=
 __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)=0A-#define __swp_entry=
(type, offset)	((swp_entry_t) { \=0A-					 ((type) << (SWP_TYPE_FIRST_BIT))=
 \=0A-					 | ((offset) << SWP_OFFSET_FIRST_BIT) })=0A+/* Extract the high =
bits for type */=0A+#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))=
=0A+=0A+/* Shift up (to get rid of type), then down to get value */=0A+#def=
ine __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)=0A+=0A+=
/*=0A+ * Shift the offset up "too far" by TYPE bits, then down again=0A+ * =
The offset is inverted by a binary not operation to make the high=0A+ * phy=
sical bits set.=0A+ */=0A+#define __swp_entry(type, offset) ((swp_entry_t) =
{ \=0A+	(~(unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \=
=0A+	| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })=0A+=0A #define __pt=
e_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })=0A #define __pmd_to=
_swp_entry(pmd)		((swp_entry_t) { pmd_val((pmd)) })=0A #define __swp_entry_=
to_pte(x)		((pte_t) { .pte =3D (x).val })=0A-- =0A2.14.3=0A=0A=0AFrom 07a23=
314494bcaf78e47852462364a6d57e9b3b1 Mon Sep 17 00:00:00 2001=0AFrom: Andi K=
leen <ak@linux.intel.com>=0ADate: Fri, 27 Apr 2018 09:47:37 -0700=0ASubject=
: [PATCH 3/8] x86, l1tf: Protect PROT_NONE PTEs against speculation=0AStatu=
s: O=0AContent-Length: 8094=0ALines: 254=0A=0AWe also need to protect PTEs =
that are set to PROT_NONE against=0AL1TF speculation attacks.=0A=0AThis is =
important inside guests, because L1TF speculation=0Abypasses physical page =
remapping. While the VM has its own=0Amigitations preventing leaking data f=
rom other VMs into=0Athe guest, this would still risk leaking the wrong pag=
e=0Ainside the current guest.=0A=0AThis uses the same technique as Linus' s=
wap entry patch:=0Awhile an entry is is in PROTNONE state we invert the=0Ac=
omplete PFN part part of it. This ensures that the=0Athe highest bit will p=
oint to non existing memory.=0A=0AThe invert is done by pte/pmd/pud_modify =
and pfn/pmd/pud_pte for=0APROTNONE and pte/pmd/pud_pfn undo it.=0A=0AWe ass=
ume that noone tries to touch the PFN part of=0Aa PTE without using these p=
rimitives.=0A=0AThis doesn't handle the case that MMIO is on the top=0Aof t=
he CPU physical memory. If such an MMIO region=0Awas exposed by an unprivil=
edged driver for mmap=0Ait would be possible to attack some real memory.=0A=
However this situation is all rather unlikely.=0A=0AFor 32bit non PAE we do=
n't try inversion because=0Athere are really not enough bits to protect any=
thing.=0A=0AQ: Why does the guest need to be protected when the=0AHyperViso=
r already has L1TF mitigations?=0AA: Here's an example:=0AYou have physical=
 pages 1 2. They get mapped into a guest as=0AGPA 1 -> PA 2=0AGPA 2 -> PA 1=
=0Athrough EPT.=0A=0AThe L1TF speculation ignores the EPT remapping.=0A=0AN=
ow the guest kernel maps GPA 1 to process A and GPA 2 to process B,=0Aand t=
hey belong to different users and should be isolated.=0A=0AA sets the GPA 1=
 PA 2 PTE to PROT_NONE to bypass the EPT remapping=0Aand gets read access t=
o the underlying physical page. Which=0Ain this case points to PA 2, so it =
can read process B's data,=0Aif it happened to be in L1.=0A=0ASo we broke i=
solation inside the guest.=0A=0AThere's nothing the hypervisor can do about=
 this. This=0Amitigation has to be done in the guest.=0A=0Av2: Use new help=
er to generate XOR mask to invert (Linus)=0Av3: Use inline helper for protn=
one mask checking=0ASigned-off-by: Andi Kleen <ak@linux.intel.com>=0AAcked-=
by: Michal Hocko <mhocko@suse.com>=0A---=0A arch/x86/include/asm/pgtable-2l=
evel.h | 17 ++++++++++++++=0A arch/x86/include/asm/pgtable-3level.h |  2 ++=
=0A arch/x86/include/asm/pgtable-invert.h | 32 +++++++++++++++++++++++++=0A=
 arch/x86/include/asm/pgtable.h        | 44 ++++++++++++++++++++++++-------=
----=0A arch/x86/include/asm/pgtable_64.h     |  2 ++=0A 5 files changed, 8=
4 insertions(+), 13 deletions(-)=0A create mode 100644 arch/x86/include/asm=
/pgtable-invert.h=0A=0Adiff --git a/arch/x86/include/asm/pgtable-2level.h b=
/arch/x86/include/asm/pgtable-2level.h=0Aindex 685ffe8a0eaf..60d0f9015317 1=
00644=0A--- a/arch/x86/include/asm/pgtable-2level.h=0A+++ b/arch/x86/includ=
e/asm/pgtable-2level.h=0A@@ -95,4 +95,21 @@ static inline unsigned long pte=
_bitop(unsigned long value, unsigned int rightshi=0A #define __pte_to_swp_e=
ntry(pte)		((swp_entry_t) { (pte).pte_low })=0A #define __swp_entry_to_pte(=
x)		((pte_t) { .pte =3D (x).val })=0A =0A+/* No inverted PFNs on 2 level pa=
ge tables */=0A+=0A+static inline u64 protnone_mask(u64 val)=0A+{=0A+	retur=
n 0;=0A+}=0A+=0A+static inline u64 flip_protnone_guard(u64 oldval, u64 val,=
 u64 mask)=0A+{=0A+	return val;=0A+}=0A+=0A+static inline bool __pte_needs_=
invert(u64 val)=0A+{=0A+	return false;=0A+}=0A+=0A #endif /* _ASM_X86_PGTAB=
LE_2LEVEL_H */=0Adiff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/=
x86/include/asm/pgtable-3level.h=0Aindex f24df59c40b2..76ab26a99e6e 100644=
=0A--- a/arch/x86/include/asm/pgtable-3level.h=0A+++ b/arch/x86/include/asm=
/pgtable-3level.h=0A@@ -295,4 +295,6 @@ static inline pte_t gup_get_pte(pte=
_t *ptep)=0A 	return pte;=0A }=0A =0A+#include <asm/pgtable-invert.h>=0A+=
=0A #endif /* _ASM_X86_PGTABLE_3LEVEL_H */=0Adiff --git a/arch/x86/include/=
asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h=0Anew file mod=
e 100644=0Aindex 000000000000..c740606b0c02=0A--- /dev/null=0A+++ b/arch/x8=
6/include/asm/pgtable-invert.h=0A@@ -0,0 +1,32 @@=0A+/* SPDX-License-Identi=
fier: GPL-2.0 */=0A+#ifndef _ASM_PGTABLE_INVERT_H=0A+#define _ASM_PGTABLE_I=
NVERT_H 1=0A+=0A+#ifndef __ASSEMBLY__=0A+=0A+static inline bool __pte_needs=
_invert(u64 val)=0A+{=0A+	return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) =3D=
=3D _PAGE_PROTNONE;=0A+}=0A+=0A+/* Get a mask to xor with the page table en=
try to get the correct pfn. */=0A+static inline u64 protnone_mask(u64 val)=
=0A+{=0A+	return __pte_needs_invert(val) ?  ~0ull : 0;=0A+}=0A+=0A+static i=
nline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask)=0A+{=0A+	/*=0A=
+	 * When a PTE transitions from NONE to !NONE or vice-versa=0A+	 * invert =
the PFN part to stop speculation.=0A+	 * pte_pfn undoes this when needed.=
=0A+	 */=0A+	if ((oldval & _PAGE_PROTNONE) !=3D (val & _PAGE_PROTNONE))=0A+=
		val =3D (val & ~mask) | (~val & mask);=0A+	return val;=0A+}=0A+=0A+#endif=
 /* __ASSEMBLY__ */=0A+=0A+#endif=0Adiff --git a/arch/x86/include/asm/pgtab=
le.h b/arch/x86/include/asm/pgtable.h=0Aindex 5f49b4ff0c24..f811e3257e87 10=
0644=0A--- a/arch/x86/include/asm/pgtable.h=0A+++ b/arch/x86/include/asm/pg=
table.h=0A@@ -185,19 +185,29 @@ static inline int pte_special(pte_t pte)=0A=
 	return pte_flags(pte) & _PAGE_SPECIAL;=0A }=0A =0A+/* Entries that were s=
et to PROT_NONE are inverted */=0A+=0A+static inline u64 protnone_mask(u64 =
val);=0A+=0A static inline unsigned long pte_pfn(pte_t pte)=0A {=0A-	return=
 (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;=0A+	unsigned long pfn =3D pte=
_val(pte);=0A+	pfn ^=3D protnone_mask(pfn);=0A+	return (pfn & PTE_PFN_MASK)=
 >> PAGE_SHIFT;=0A }=0A =0A static inline unsigned long pmd_pfn(pmd_t pmd)=
=0A {=0A-	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;=0A+	unsi=
gned long pfn =3D pmd_val(pmd);=0A+	pfn ^=3D protnone_mask(pfn);=0A+	return=
 (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;=0A }=0A =0A static inline unsigne=
d long pud_pfn(pud_t pud)=0A {=0A-	return (pud_val(pud) & pud_pfn_mask(pud)=
) >> PAGE_SHIFT;=0A+	unsigned long pfn =3D pud_val(pud);=0A+	pfn ^=3D protn=
one_mask(pfn);=0A+	return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT;=0A }=0A =
=0A static inline unsigned long p4d_pfn(p4d_t p4d)=0A@@ -545,25 +555,33 @@ =
static inline pgprotval_t check_pgprot(pgprot_t pgprot)=0A =0A static inlin=
e pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)=0A {=0A-	return __p=
te(((phys_addr_t)page_nr << PAGE_SHIFT) |=0A-		     check_pgprot(pgprot));=
=0A+	phys_addr_t pfn =3D page_nr << PAGE_SHIFT;=0A+	pfn ^=3D protnone_mask(=
pgprot_val(pgprot));=0A+	pfn &=3D PTE_PFN_MASK;=0A+	return __pte(pfn | chec=
k_pgprot(pgprot));=0A }=0A =0A static inline pmd_t pfn_pmd(unsigned long pa=
ge_nr, pgprot_t pgprot)=0A {=0A-	return __pmd(((phys_addr_t)page_nr << PAGE=
_SHIFT) |=0A-		     check_pgprot(pgprot));=0A+	phys_addr_t pfn =3D page_nr =
<< PAGE_SHIFT;=0A+	pfn ^=3D protnone_mask(pgprot_val(pgprot));=0A+	pfn &=3D=
 PHYSICAL_PMD_PAGE_MASK;=0A+	return __pmd(pfn | check_pgprot(pgprot));=0A }=
=0A =0A static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot)=
=0A {=0A-	return __pud(((phys_addr_t)page_nr << PAGE_SHIFT) |=0A-		     che=
ck_pgprot(pgprot));=0A+	phys_addr_t pfn =3D page_nr << PAGE_SHIFT;=0A+	pfn =
^=3D protnone_mask(pgprot_val(pgprot));=0A+	pfn &=3D PHYSICAL_PUD_PAGE_MASK=
;=0A+	return __pud(pfn | check_pgprot(pgprot));=0A }=0A =0A+static inline u=
64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);=0A+=0A static inline=
 pte_t pte_modify(pte_t pte, pgprot_t newprot)=0A {=0A-	pteval_t val =3D pt=
e_val(pte);=0A+	pteval_t val =3D pte_val(pte), oldval =3D val;=0A =0A 	/*=
=0A 	 * Chop off the NX bit (if present), and add the NX portion of=0A@@ -5=
71,17 +589,17 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot=
)=0A 	 */=0A 	val &=3D _PAGE_CHG_MASK;=0A 	val |=3D check_pgprot(newprot) &=
 ~_PAGE_CHG_MASK;=0A-=0A+	val =3D flip_protnone_guard(oldval, val, PTE_PFN_=
MASK);=0A 	return __pte(val);=0A }=0A =0A static inline pmd_t pmd_modify(pm=
d_t pmd, pgprot_t newprot)=0A {=0A-	pmdval_t val =3D pmd_val(pmd);=0A+	pmdv=
al_t val =3D pmd_val(pmd), oldval =3D val;=0A =0A 	val &=3D _HPAGE_CHG_MASK=
;=0A 	val |=3D check_pgprot(newprot) & ~_HPAGE_CHG_MASK;=0A-=0A+	val =3D fl=
ip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);=0A 	return __pmd(va=
l);=0A }=0A =0Adiff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/in=
clude/asm/pgtable_64.h=0Aindex 593c3cf259dd..ea99272ab63e 100644=0A--- a/ar=
ch/x86/include/asm/pgtable_64.h=0A+++ b/arch/x86/include/asm/pgtable_64.h=
=0A@@ -357,5 +357,7 @@ static inline bool gup_fast_permitted(unsigned long =
start, int nr_pages,=0A 	return true;=0A }=0A =0A+#include <asm/pgtable-inv=
ert.h>=0A+=0A #endif /* !__ASSEMBLY__ */=0A #endif /* _ASM_X86_PGTABLE_64_H=
 */=0A-- =0A2.14.3=0A=0A=0AFrom c75da7960a5888721ae8921a49dd485e8c97b3c3 Mo=
n Sep 17 00:00:00 2001=0AFrom: Andi Kleen <ak@linux.intel.com>=0ADate: Mon,=
 23 Apr 2018 15:57:54 -0700=0ASubject: [PATCH 4/8] x86, l1tf: Make sure the=
 first page is always reserved=0AStatus: O=0AContent-Length: 985=0ALines: 3=
1=0A=0AThe L1TF workaround doesn't make any attempt to mitigate speculate=
=0Aaccesses to the first physical page for zeroed PTEs. Normally=0Ait only =
contains some data from the early real mode BIOS.=0A=0AI couldn't convince =
myself we always reserve the first page in=0Aall configurations, so add an =
extra reservation call to=0Amake sure it is really reserved. In most config=
urations (e.g.=0Awith the standard reservations) it's likely a nop.=0A=0ASi=
gned-off-by: Andi Kleen <ak@linux.intel.com>=0A---=0A arch/x86/kernel/setup=
=2Ec | 3 +++=0A 1 file changed, 3 insertions(+)=0A=0Adiff --git a/arch/x86/=
kernel/setup.c b/arch/x86/kernel/setup.c=0Aindex 6285697b6e56..fadbd41094d2=
 100644=0A--- a/arch/x86/kernel/setup.c=0A+++ b/arch/x86/kernel/setup.c=0A@=
@ -817,6 +817,9 @@ void __init setup_arch(char **cmdline_p)=0A 	memblock_re=
serve(__pa_symbol(_text),=0A 			 (unsigned long)__bss_stop - (unsigned long=
)_text);=0A =0A+	/* Make sure page 0 is always reserved */=0A+	memblock_res=
erve(0, PAGE_SIZE);=0A+=0A 	early_reserve_initrd();=0A =0A 	/*=0A-- =0A2.14=
=2E3=0A=0A=0AFrom c94c11d610008319d373292207356675438627e8 Mon Sep 17 00:00=
:00 2001=0AFrom: Andi Kleen <ak@linux.intel.com>=0ADate: Fri, 27 Apr 2018 1=
4:44:53 -0700=0ASubject: [PATCH 5/8] x86, l1tf: Add sysfs reporting for l1t=
f=0AStatus: O=0AContent-Length: 5334=0ALines: 141=0A=0AL1TF core kernel wor=
karounds are cheap and normally always enabled,=0AHowever we still want to =
report in sysfs if the system is vulnerable=0Aor mitigated. Add the necessa=
ry checks.=0A=0A- We use the same checks as Meltdown to determine if the sy=
stem is=0Avulnerable. This excludes some Atom CPUs which don't have this=0A=
problem.=0A- We check for the (very unlikely) memory > MAX_PA/2 case=0A- We=
 check for 32bit non PAE and warn=0A=0ANote this patch will likely conflict=
 with some other workaround patches=0Afloating around, but should be straig=
ht forward to fix.=0A=0Av2: Use positive instead of negative flag for WA. F=
ix override=0Areporting.=0Av3: Fix L1TF_WA flag settting=0ASigned-off-by: A=
ndi Kleen <ak@linux.intel.com>=0A---=0A arch/x86/include/asm/cpufeatures.h =
|  2 ++=0A arch/x86/kernel/cpu/bugs.c         | 11 +++++++++++=0A arch/x86/=
kernel/cpu/common.c       | 15 ++++++++++++++-=0A drivers/base/cpu.c       =
          |  8 ++++++++=0A include/linux/cpu.h                |  2 ++=0A 5 =
files changed, 37 insertions(+), 1 deletion(-)=0A=0Adiff --git a/arch/x86/i=
nclude/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h=0Aindex d554c=
11e01ff..f1bfe8a37b84 100644=0A--- a/arch/x86/include/asm/cpufeatures.h=0A+=
++ b/arch/x86/include/asm/cpufeatures.h=0A@@ -214,6 +214,7 @@=0A =0A #defin=
e X86_FEATURE_USE_IBPB		( 7*32+21) /* "" Indirect Branch Prediction Barrier=
 enabled */=0A #define X86_FEATURE_USE_IBRS_FW		( 7*32+22) /* "" Use IBRS d=
uring runtime firmware calls */=0A+#define X86_FEATURE_L1TF_WA		( 7*32+23) =
/* "" L1TF workaround used */=0A =0A /* Virtualization flags: Linux defined=
, word 8 */=0A #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shad=
ow */=0A@@ -362,5 +363,6 @@=0A #define X86_BUG_CPU_MELTDOWN		X86_BUG(14) /*=
 CPU is affected by meltdown attack and needs kernel page table isolation *=
/=0A #define X86_BUG_SPECTRE_V1		X86_BUG(15) /* CPU is affected by Spectre =
variant 1 attack with conditional branches */=0A #define X86_BUG_SPECTRE_V2=
		X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect =
branches */=0A+#define X86_BUG_L1TF			X86_BUG(17) /* CPU is affected by L1 =
Terminal Fault */=0A =0A #endif /* _ASM_X86_CPUFEATURES_H */=0Adiff --git a=
/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c=0Aindex bfca937bdc=
c3..e1f67b7c5217 100644=0A--- a/arch/x86/kernel/cpu/bugs.c=0A+++ b/arch/x86=
/kernel/cpu/bugs.c=0A@@ -340,4 +340,15 @@ ssize_t cpu_show_spectre_v2(struc=
t device *dev, struct device_attribute *attr, c=0A 		       boot_cpu_has(X8=
6_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",=0A 		       spectre_v2_module_st=
ring());=0A }=0A+=0A+ssize_t cpu_show_l1tf(struct device *dev, struct devic=
e_attribute *attr, char *buf)=0A+{=0A+	if (!boot_cpu_has_bug(X86_BUG_L1TF))=
=0A+		return sprintf(buf, "Not affected\n");=0A+=0A+	if (boot_cpu_has(X86_F=
EATURE_L1TF_WA))=0A+		return sprintf(buf, "Mitigated\n");=0A+=0A+	return sp=
rintf(buf, "Mitigation Unavailable\n");=0A+}=0A #endif=0Adiff --git a/arch/=
x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c=0Aindex 8a5b185735e1=
=2E.8bb14ccb2f4b 100644=0A--- a/arch/x86/kernel/cpu/common.c=0A+++ b/arch/x=
86/kernel/cpu/common.c=0A@@ -940,6 +940,15 @@ static bool __init cpu_vulner=
able_to_meltdown(struct cpuinfo_x86 *c)=0A 	return true;=0A }=0A =0A+static=
 bool __init l1tf_wa_possible(void)=0A+{=0A+#if CONFIG_PGTABLE_LEVELS =3D=
=3D 2=0A+	pr_warn("Kernel not compiled for PAE. No workaround for L1TF\n");=
=0A+	return false;=0A+#endif=0A+	return true;=0A+}=0A+=0A /*=0A  * Do minim=
um CPU detection early.=0A  * Fields really needed: vendor, cpuid_level, fa=
mily, model, mask,=0A@@ -989,8 +998,12 @@ static void __init early_identify=
_cpu(struct cpuinfo_x86 *c)=0A 	setup_force_cpu_cap(X86_FEATURE_ALWAYS);=0A=
 =0A 	if (!x86_match_cpu(cpu_no_speculation)) {=0A-		if (cpu_vulnerable_to_=
meltdown(c))=0A+		if (cpu_vulnerable_to_meltdown(c)) {=0A 			setup_force_cp=
u_bug(X86_BUG_CPU_MELTDOWN);=0A+			setup_force_cpu_bug(X86_BUG_L1TF);=0A+		=
	if (l1tf_wa_possible())=0A+				setup_force_cpu_cap(X86_FEATURE_L1TF_WA);=
=0A+		}=0A 		setup_force_cpu_bug(X86_BUG_SPECTRE_V1);=0A 		setup_force_cpu_=
bug(X86_BUG_SPECTRE_V2);=0A 	}=0Adiff --git a/drivers/base/cpu.c b/drivers/=
base/cpu.c=0Aindex 2da998baa75c..ed7b8591d461 100644=0A--- a/drivers/base/c=
pu.c=0A+++ b/drivers/base/cpu.c=0A@@ -534,14 +534,22 @@ ssize_t __weak cpu_=
show_spectre_v2(struct device *dev,=0A 	return sprintf(buf, "Not affected\n=
");=0A }=0A =0A+ssize_t __weak cpu_show_l1tf(struct device *dev,=0A+				   =
struct device_attribute *attr, char *buf)=0A+{=0A+	return sprintf(buf, "Not=
 affected\n");=0A+}=0A+=0A static DEVICE_ATTR(meltdown, 0444, cpu_show_melt=
down, NULL);=0A static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, N=
ULL);=0A static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);=
=0A+static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);=0A =0A static stru=
ct attribute *cpu_root_vulnerabilities_attrs[] =3D {=0A 	&dev_attr_meltdown=
=2Eattr,=0A 	&dev_attr_spectre_v1.attr,=0A 	&dev_attr_spectre_v2.attr,=0A+	=
&dev_attr_l1tf.attr,=0A 	NULL=0A };=0A =0Adiff --git a/include/linux/cpu.h =
b/include/linux/cpu.h=0Aindex 7b01bc11c692..75c430046ca0 100644=0A--- a/inc=
lude/linux/cpu.h=0A+++ b/include/linux/cpu.h=0A@@ -53,6 +53,8 @@ extern ssi=
ze_t cpu_show_spectre_v1(struct device *dev,=0A 				   struct device_attrib=
ute *attr, char *buf);=0A extern ssize_t cpu_show_spectre_v2(struct device =
*dev,=0A 				   struct device_attribute *attr, char *buf);=0A+extern ssize_=
t cpu_show_l1tf(struct device *dev,=0A+				   struct device_attribute *attr=
, char *buf);=0A =0A extern __printf(4, 5)=0A struct device *cpu_device_cre=
ate(struct device *parent, void *drvdata,=0A-- =0A2.14.3=0A=0A=0AFrom 524e0=
68e6b7286121da0b3979bb20fd5a2b3fe38 Mon Sep 17 00:00:00 2001=0AFrom: Andi K=
leen <ak@linux.intel.com>=0ADate: Fri, 9 Feb 2018 10:36:15 -0800=0ASubject:=
 [PATCH 6/8] x86, l1tf: Report if too much memory for L1TF workaround=0ASta=
tus: RO=0AContent-Length: 2703=0ALines: 87=0A=0AIf the system has more than=
 MAX_PA/2 physical memory the=0Ainvert page workarounds don't protect the s=
ystem against=0Athe L1TF attack anymore, because an inverted physical addre=
ss=0Awill point to valid memory.=0A=0AWe cannot do much here, after all use=
rs want to use the=0Amemory, but at least print a warning and report the sy=
stem as=0Avulnerable in sysfs=0A=0ANote this is all extremely unlikely to h=
appen on a real machine=0Abecause they typically have far more MAX_PA than =
DIMM slots=0A=0ASome VMs also report fairly small PAs to guest, e.g. only 3=
6bits.=0AIn this case the threshold will be lower, but applies only=0Ato th=
e maximum guest size.=0A=0ASince this needs to clear a feature bit that has=
 been forced=0Aearlier add a special "unforce" macro that supports this.=0A=
=0ASigned-off-by: Andi Kleen <ak@linux.intel.com>=0A---=0A arch/x86/include=
/asm/cpufeature.h |  5 +++++=0A arch/x86/kernel/setup.c           | 25 ++++=
++++++++++++++++++++-=0A 2 files changed, 29 insertions(+), 1 deletion(-)=
=0A=0Adiff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm=
/cpufeature.h=0Aindex b27da9602a6d..f78bfd2464c1 100644=0A--- a/arch/x86/in=
clude/asm/cpufeature.h=0A+++ b/arch/x86/include/asm/cpufeature.h=0A@@ -138,=
6 +138,11 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int =
bit);=0A 	set_bit(bit, (unsigned long *)cpu_caps_set);	\=0A } while (0)=0A =
=0A+#define setup_unforce_cpu_cap(bit) do { \=0A+	clear_cpu_cap(&boot_cpu_d=
ata, bit);	\=0A+	clear_bit(bit, (unsigned long *)cpu_caps_set);	\=0A+} whil=
e (0)=0A+=0A #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)=0A =
=0A /*=0Adiff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c=0Ai=
ndex fadbd41094d2..b49fcb3e3a97 100644=0A--- a/arch/x86/kernel/setup.c=0A++=
+ b/arch/x86/kernel/setup.c=0A@@ -779,7 +779,28 @@ static void __init trim_=
low_memory_range(void)=0A {=0A 	memblock_reserve(0, ALIGN(reserve_low, PAGE=
_SIZE));=0A }=0A-	=0A+=0A+static __init void check_maxpa_memory(void)=0A+{=
=0A+	u64 len;=0A+=0A+	if (!boot_cpu_has(X86_BUG_L1TF))=0A+		return;=0A+=0A+=
	len =3D BIT_ULL(boot_cpu_data.x86_phys_bits - 1) - 1;=0A+=0A+	/*=0A+	 * Th=
is is extremely unlikely to happen because systems near always have far=0A+=
	 * more MAX_PA than DIMM slots.=0A+	 */=0A+	if (e820__mapped_any(len, ULLO=
NG_MAX - len,=0A+				     E820_TYPE_RAM)) {=0A+		pr_warn("System has more t=
han MAX_PA/2 memory. Disabled L1TF workaround\n");=0A+		/* Was forced earli=
er, so now unforce it. */=0A+		setup_unforce_cpu_cap(X86_FEATURE_L1TF_WA);=
=0A+	}=0A+}=0A+=0A /*=0A  * Dump out kernel offset information on panic.=0A=
  */=0A@@ -1016,6 +1037,8 @@ void __init setup_arch(char **cmdline_p)=0A 	i=
nsert_resource(&iomem_resource, &data_resource);=0A 	insert_resource(&iomem=
_resource, &bss_resource);=0A =0A+	check_maxpa_memory();=0A+=0A 	e820_add_k=
ernel_range();=0A 	trim_bios_range();=0A #ifdef CONFIG_X86_32=0A-- =0A2.14.=
3=0A=0A=0AFrom b31f6dd0e2447e3cbc0959209a946a5224d10499 Mon Sep 17 00:00:00=
 2001=0AFrom: Andi Kleen <ak@linux.intel.com>=0ADate: Fri, 27 Apr 2018 15:2=
9:17 -0700=0ASubject: [PATCH 7/8] x86, l1tf: Limit swap file size to MAX_PA=
/2=0AStatus: O=0AContent-Length: 5291=0ALines: 148=0A=0AFor the L1TF workar=
ound we want to limit the swap file size to below=0AMAX_PA/2, so that the h=
igher bits of the swap offset inverted never=0Apoint to valid memory.=0A=0A=
Add a way for the architecture to override the swap file=0Asize check in sw=
apfile.c and add a x86 specific max swapfile check=0Afunction that enforces=
 that limit.=0A=0AThe check is only enabled if the CPU is vulnerable to L1T=
F.=0A=0AIn VMs with 42bit MAX_PA the typical limit is 2TB now,=0Aon a nativ=
e system with 46bit PA it is 32TB. The limit=0Ais only per individual swap =
file, so it's always possible=0Ato exceed these limits with multiple swap f=
iles or=0Apartitions.=0A=0Av2: Use new helper for maxpa_mask computation.=
=0ASigned-off-by: Andi Kleen <ak@linux.intel.com>=0A---=0A arch/x86/include=
/asm/processor.h |  5 +++++=0A arch/x86/mm/init.c               | 15 ++++++=
++++++++=0A include/linux/swapfile.h         |  2 ++=0A mm/swapfile.c      =
              | 44 +++++++++++++++++++++++++---------------=0A 4 files chan=
ged, 50 insertions(+), 16 deletions(-)=0A=0Adiff --git a/arch/x86/include/a=
sm/processor.h b/arch/x86/include/asm/processor.h=0Aindex 21a114914ba4..2bd=
676e450cf 100644=0A--- a/arch/x86/include/asm/processor.h=0A+++ b/arch/x86/=
include/asm/processor.h=0A@@ -181,6 +181,11 @@ extern const struct seq_oper=
ations cpuinfo_op;=0A =0A extern void cpu_detect(struct cpuinfo_x86 *c);=0A=
 =0A+static inline u64 maxpa_pfn_bit(int offset)=0A+{=0A+	return BIT_ULL(bo=
ot_cpu_data.x86_phys_bits - offset - PAGE_SHIFT);=0A+}=0A+=0A extern void e=
arly_cpu_init(void);=0A extern void identify_boot_cpu(void);=0A extern void=
 identify_secondary_cpu(struct cpuinfo_x86 *);=0Adiff --git a/arch/x86/mm/i=
nit.c b/arch/x86/mm/init.c=0Aindex fec82b577c18..b4078eb05ca0 100644=0A--- =
a/arch/x86/mm/init.c=0A+++ b/arch/x86/mm/init.c=0A@@ -4,6 +4,8 @@=0A #inclu=
de <linux/swap.h>=0A #include <linux/memblock.h>=0A #include <linux/bootmem=
=2Eh>	/* for max_low_pfn */=0A+#include <linux/swapfile.h>=0A+#include <lin=
ux/swapops.h>=0A =0A #include <asm/set_memory.h>=0A #include <asm/e820/api.=
h>=0A@@ -878,3 +880,16 @@ void update_cache_mode_entry(unsigned entry, enum=
 page_cache_mode cache)=0A 	__cachemode2pte_tbl[cache] =3D __cm_idx2pte(ent=
ry);=0A 	__pte2cachemode_tbl[entry] =3D cache;=0A }=0A+=0A+unsigned long ma=
x_swapfile_size(void)=0A+{=0A+	unsigned long pages;=0A+=0A+	pages =3D gener=
ic_max_swapfile_size();=0A+=0A+	if (boot_cpu_has(X86_BUG_L1TF)) {=0A+		/* L=
imit the swap file size to MAX_PA/2 for L1TF workaround */=0A+		pages =3D m=
in_t(unsigned long, maxpa_pfn_bit(1), pages);=0A+	}=0A+	return pages;=0A+}=
=0Adiff --git a/include/linux/swapfile.h b/include/linux/swapfile.h=0Aindex=
 06bd7b096167..e06febf62978 100644=0A--- a/include/linux/swapfile.h=0A+++ b=
/include/linux/swapfile.h=0A@@ -10,5 +10,7 @@ extern spinlock_t swap_lock;=
=0A extern struct plist_head swap_active_head;=0A extern struct swap_info_s=
truct *swap_info[];=0A extern int try_to_unuse(unsigned int, bool, unsigned=
 long);=0A+extern unsigned long generic_max_swapfile_size(void);=0A+extern =
unsigned long max_swapfile_size(void);=0A =0A #endif /* _LINUX_SWAPFILE_H *=
/=0Adiff --git a/mm/swapfile.c b/mm/swapfile.c=0Aindex cc2cf04d9018..413f48=
424194 100644=0A--- a/mm/swapfile.c=0A+++ b/mm/swapfile.c=0A@@ -2909,6 +290=
9,33 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode =
*inode)=0A 	return 0;=0A }=0A =0A+=0A+/*=0A+ * Find out how many pages are =
allowed for a single swap=0A+ * device. There are two limiting factors: 1) =
the number=0A+ * of bits for the swap offset in the swp_entry_t type, and=
=0A+ * 2) the number of bits in the swap pte as defined by the=0A+ * differ=
ent architectures. In order to find the=0A+ * largest possible bit mask, a =
swap entry with swap type 0=0A+ * and swap offset ~0UL is created, encoded =
to a swap pte,=0A+ * decoded to a swp_entry_t again, and finally the swap=
=0A+ * offset is extracted. This will mask all the bits from=0A+ * the init=
ial ~0UL mask that can't be encoded in either=0A+ * the swp_entry_t or the =
architecture definition of a=0A+ * swap pte.=0A+ */=0A+unsigned long generi=
c_max_swapfile_size(void)=0A+{=0A+	return swp_offset(pte_to_swp_entry(=0A+	=
		swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;=0A+}=0A+=0A+/* Can be overrid=
den by an architecture for additional checks. */=0A+__weak unsigned long ma=
x_swapfile_size(void)=0A+{=0A+	return generic_max_swapfile_size();=0A+}=0A+=
=0A static unsigned long read_swap_header(struct swap_info_struct *p,=0A 		=
			union swap_header *swap_header,=0A 					struct inode *inode)=0A@@ -2944,=
22 +2971,7 @@ static unsigned long read_swap_header(struct swap_info_struct=
 *p,=0A 	p->cluster_next =3D 1;=0A 	p->cluster_nr =3D 0;=0A =0A-	/*=0A-	 * =
Find out how many pages are allowed for a single swap=0A-	 * device. There =
are two limiting factors: 1) the number=0A-	 * of bits for the swap offset =
in the swp_entry_t type, and=0A-	 * 2) the number of bits in the swap pte a=
s defined by the=0A-	 * different architectures. In order to find the=0A-	 =
* largest possible bit mask, a swap entry with swap type 0=0A-	 * and swap =
offset ~0UL is created, encoded to a swap pte,=0A-	 * decoded to a swp_entr=
y_t again, and finally the swap=0A-	 * offset is extracted. This will mask =
all the bits from=0A-	 * the initial ~0UL mask that can't be encoded in eit=
her=0A-	 * the swp_entry_t or the architecture definition of a=0A-	 * swap =
pte.=0A-	 */=0A-	maxpages =3D swp_offset(pte_to_swp_entry(=0A-			swp_entry_=
to_pte(swp_entry(0, ~0UL)))) + 1;=0A+	maxpages =3D max_swapfile_size();=0A =
	last_page =3D swap_header->info.last_page;=0A 	if (!last_page) {=0A 		pr_w=
arn("Empty swap-file\n");=0A-- =0A2.14.3=0A=0A=0AFrom 76d1413d7854087f1c2c0=
870eeedc77507c2f25a Mon Sep 17 00:00:00 2001=0AFrom: Andi Kleen <ak@linux.i=
ntel.com>=0ADate: Thu, 3 May 2018 16:39:51 -0700=0ASubject: [PATCH 8/8] mm,=
 l1tf: Disallow non privileged high MMIO PROT_NONE=0A mappings=0AStatus: O=
=0AContent-Length: 9351=0ALines: 291=0A=0AFor L1TF PROT_NONE mappings are p=
rotected by inverting the PFN in the=0Apage table entry. This sets the high=
 bits in the CPU's address space,=0Athus making sure to point to not point =
an unmapped entry to valid=0Acached memory.=0A=0ASome server system BIOS pu=
t the MMIO mappings high up in the physical=0Aaddress space. If such an hig=
h mapping was mapped to an unprivileged=0Auser they could attack low memory=
 by setting such a mapping to=0APROT_NONE. This could happen through a spec=
ial device driver=0Awhich is not access protected. Normal /dev/mem is of co=
urse=0Aaccess protect.=0A=0ATo avoid this we forbid PROT_NONE mappings or m=
protect for high MMIO=0Amappings.=0A=0AValid page mappings are allowed beca=
use the system is then unsafe=0Aanyways.=0A=0AWe don't expect users to comm=
only use PROT_NONE on MMIO. But=0Ato minimize any impact here we only do th=
is if the mapping actually=0Arefers to a high MMIO address (defined as the =
MAX_PA-1 bit being set),=0Aand also skip the check for root.=0A=0AFor mmaps=
 this is straight forward and can be handled in vm_insert_pfn=0Aand in rema=
p_pfn_range().=0A=0AFor mprotect it's a bit trickier. At the point we're lo=
oking at the=0Aactual PTEs a lot of state has been changed and would be dif=
ficult=0Ato undo on an error. Since this is a uncommon case we use a separa=
te=0Aearly page talk walk pass for MMIO PROT_NONE mappings that=0Achecks fo=
r this condition early. For non MMIO and non PROT_NONE=0Athere are no chang=
es.=0A=0Av2: Use new helpers added earlier=0ASigned-off-by: Andi Kleen <ak@=
linux.intel.com>=0A---=0A arch/x86/include/asm/pgtable.h |  4 ++++=0A arch/=
x86/mm/mmap.c             | 19 ++++++++++++++++=0A include/asm-generic/pgta=
ble.h  | 12 +++++++++++=0A mm/memory.c                    | 37 ++++++++++++=
++++++++++---------=0A mm/mprotect.c                  | 49 ++++++++++++++++=
++++++++++++++++++++++++++=0A 5 files changed, 111 insertions(+), 10 deleti=
ons(-)=0A=0Adiff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/=
asm/pgtable.h=0Aindex f811e3257e87..338897c3b36f 100644=0A--- a/arch/x86/in=
clude/asm/pgtable.h=0A+++ b/arch/x86/include/asm/pgtable.h=0A@@ -1333,6 +13=
33,10 @@ static inline bool pud_access_permitted(pud_t pud, bool write)=0A =
	return __pte_access_permitted(pud_val(pud), write);=0A }=0A =0A+#define __=
HAVE_ARCH_PFN_MODIFY_ALLOWED 1=0A+extern bool pfn_modify_allowed(unsigned l=
ong pfn, pgprot_t prot);=0A+static inline bool arch_has_pfn_modify_check(vo=
id) { return true; }=0A+=0A #include <asm-generic/pgtable.h>=0A #endif	/* _=
_ASSEMBLY__ */=0A =0Adiff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c=
=0Aindex 48c591251600..369b67226f81 100644=0A--- a/arch/x86/mm/mmap.c=0A+++=
 b/arch/x86/mm/mmap.c=0A@@ -240,3 +240,22 @@ int valid_mmap_phys_addr_range=
(unsigned long pfn, size_t count)=0A =0A 	return phys_addr_valid(addr + cou=
nt - 1);=0A }=0A+=0A+/*=0A+ * Only allow root to set high MMIO mappings to =
PROT_NONE.=0A+ * This prevents an unpriv. user to set them to PROT_NONE and=
 invert=0A+ * them, then pointing to valid memory for L1TF speculation.=0A+=
 */=0A+bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)=0A+{=0A+	i=
f (!boot_cpu_has(X86_BUG_L1TF))=0A+		return true;=0A+	if (__pte_needs_inver=
t(pgprot_val(prot)))=0A+		return true;=0A+	/* If it's real memory always al=
low */=0A+	if (pfn_valid(pfn))=0A+		return true;=0A+	if ((pfn & maxpa_pfn_b=
it(1)) && !capable(CAP_SYS_ADMIN))=0A+		return false;=0A+	return true;=0A+}=
=0Adiff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable=
=2Eh=0Aindex f59639afaa39..0ecc1197084b 100644=0A--- a/include/asm-generic/=
pgtable.h=0A+++ b/include/asm-generic/pgtable.h=0A@@ -1097,4 +1097,16 @@ st=
atic inline void init_espfix_bsp(void) { }=0A #endif=0A #endif=0A =0A+#ifnd=
ef __HAVE_ARCH_PFN_MODIFY_ALLOWED=0A+static inline bool pfn_modify_allowed(=
unsigned long pfn, pgprot_t prot)=0A+{=0A+	return true;=0A+}=0A+=0A+static =
inline bool arch_has_pfn_modify_check(void)=0A+{=0A+	return false;=0A+}=0A+=
#endif=0A+=0A #endif /* _ASM_GENERIC_PGTABLE_H */=0Adiff --git a/mm/memory.=
c b/mm/memory.c=0Aindex 01f5464e0fd2..fe497cecd2ab 100644=0A--- a/mm/memory=
=2Ec=0A+++ b/mm/memory.c=0A@@ -1891,6 +1891,9 @@ int vm_insert_pfn_prot(str=
uct vm_area_struct *vma, unsigned long addr,=0A 	if (addr < vma->vm_start |=
| addr >=3D vma->vm_end)=0A 		return -EFAULT;=0A =0A+	if (!pfn_modify_allow=
ed(pfn, pgprot))=0A+		return -EACCES;=0A+=0A 	track_pfn_insert(vma, &pgprot=
, __pfn_to_pfn_t(pfn, PFN_DEV));=0A =0A 	ret =3D insert_pfn(vma, addr, __pf=
n_to_pfn_t(pfn, PFN_DEV), pgprot,=0A@@ -1926,6 +1929,9 @@ static int __vm_i=
nsert_mixed(struct vm_area_struct *vma, unsigned long addr,=0A =0A 	track_p=
fn_insert(vma, &pgprot, pfn);=0A =0A+	if (!pfn_modify_allowed(pfn_t_to_pfn(=
pfn), pgprot))=0A+		return -EACCES;=0A+=0A 	/*=0A 	 * If we don't have pte =
special, then we have to use the pfn_valid()=0A 	 * based VM_MIXEDMAP schem=
e (see vm_normal_page), and thus we *must*=0A@@ -1973,6 +1979,7 @@ static i=
nt remap_pte_range(struct mm_struct *mm, pmd_t *pmd,=0A {=0A 	pte_t *pte;=
=0A 	spinlock_t *ptl;=0A+	int err =3D 0;=0A =0A 	pte =3D pte_alloc_map_lock=
(mm, pmd, addr, &ptl);=0A 	if (!pte)=0A@@ -1980,12 +1987,16 @@ static int r=
emap_pte_range(struct mm_struct *mm, pmd_t *pmd,=0A 	arch_enter_lazy_mmu_mo=
de();=0A 	do {=0A 		BUG_ON(!pte_none(*pte));=0A+		if (!pfn_modify_allowed(p=
fn, prot)) {=0A+			err =3D -EACCES;=0A+			break;=0A+		}=0A 		set_pte_at(mm,=
 addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));=0A 		pfn++;=0A 	} while (pt=
e++, addr +=3D PAGE_SIZE, addr !=3D end);=0A 	arch_leave_lazy_mmu_mode();=
=0A 	pte_unmap_unlock(pte - 1, ptl);=0A-	return 0;=0A+	return err;=0A }=0A =
=0A static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,=0A@=
@ -1994,6 +2005,7 @@ static inline int remap_pmd_range(struct mm_struct *mm=
, pud_t *pud,=0A {=0A 	pmd_t *pmd;=0A 	unsigned long next;=0A+	int err;=0A =
=0A 	pfn -=3D addr >> PAGE_SHIFT;=0A 	pmd =3D pmd_alloc(mm, pud, addr);=0A@=
@ -2002,9 +2014,10 @@ static inline int remap_pmd_range(struct mm_struct *m=
m, pud_t *pud,=0A 	VM_BUG_ON(pmd_trans_huge(*pmd));=0A 	do {=0A 		next =3D =
pmd_addr_end(addr, end);=0A-		if (remap_pte_range(mm, pmd, addr, next,=0A-	=
			pfn + (addr >> PAGE_SHIFT), prot))=0A-			return -ENOMEM;=0A+		err =3D re=
map_pte_range(mm, pmd, addr, next,=0A+				pfn + (addr >> PAGE_SHIFT), prot)=
;=0A+		if (err)=0A+			return err;=0A 	} while (pmd++, addr =3D next, addr !=
=3D end);=0A 	return 0;=0A }=0A@@ -2015,6 +2028,7 @@ static inline int rema=
p_pud_range(struct mm_struct *mm, p4d_t *p4d,=0A {=0A 	pud_t *pud;=0A 	unsi=
gned long next;=0A+	int err;=0A =0A 	pfn -=3D addr >> PAGE_SHIFT;=0A 	pud =
=3D pud_alloc(mm, p4d, addr);=0A@@ -2022,9 +2036,10 @@ static inline int re=
map_pud_range(struct mm_struct *mm, p4d_t *p4d,=0A 		return -ENOMEM;=0A 	do=
 {=0A 		next =3D pud_addr_end(addr, end);=0A-		if (remap_pmd_range(mm, pud,=
 addr, next,=0A-				pfn + (addr >> PAGE_SHIFT), prot))=0A-			return -ENOMEM=
;=0A+		err =3D remap_pmd_range(mm, pud, addr, next,=0A+				pfn + (addr >> P=
AGE_SHIFT), prot);=0A+		if (err)=0A+			return err;=0A 	} while (pud++, addr=
 =3D next, addr !=3D end);=0A 	return 0;=0A }=0A@@ -2035,6 +2050,7 @@ stati=
c inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd,=0A {=0A 	p4d=
_t *p4d;=0A 	unsigned long next;=0A+	int err;=0A =0A 	pfn -=3D addr >> PAGE=
_SHIFT;=0A 	p4d =3D p4d_alloc(mm, pgd, addr);=0A@@ -2042,9 +2058,10 @@ stat=
ic inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd,=0A 		return=
 -ENOMEM;=0A 	do {=0A 		next =3D p4d_addr_end(addr, end);=0A-		if (remap_pu=
d_range(mm, p4d, addr, next,=0A-				pfn + (addr >> PAGE_SHIFT), prot))=0A-	=
		return -ENOMEM;=0A+		err =3D remap_pud_range(mm, p4d, addr, next,=0A+				=
pfn + (addr >> PAGE_SHIFT), prot);=0A+		if (err)=0A+			return err;=0A 	} wh=
ile (p4d++, addr =3D next, addr !=3D end);=0A 	return 0;=0A }=0Adiff --git =
a/mm/mprotect.c b/mm/mprotect.c=0Aindex 625608bc8962..6d331620b9e5 100644=
=0A--- a/mm/mprotect.c=0A+++ b/mm/mprotect.c=0A@@ -306,6 +306,42 @@ unsigne=
d long change_protection(struct vm_area_struct *vma, unsigned long start,=
=0A 	return pages;=0A }=0A =0A+static int prot_none_pte_entry(pte_t *pte, u=
nsigned long addr,=0A+			       unsigned long next, struct mm_walk *walk)=
=0A+{=0A+	return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->priv=
ate)) ?=0A+		0 : -EACCES;=0A+}=0A+=0A+static int prot_none_hugetlb_entry(pt=
e_t *pte, unsigned long hmask,=0A+				   unsigned long addr, unsigned long =
next,=0A+				   struct mm_walk *walk)=0A+{=0A+	return pfn_modify_allowed(pt=
e_pfn(*pte), *(pgprot_t *)(walk->private)) ?=0A+		0 : -EACCES;=0A+}=0A+=0A+=
static int prot_none_test(unsigned long addr, unsigned long next,=0A+			  s=
truct mm_walk *walk)=0A+{=0A+	return 0;=0A+}=0A+=0A+static int prot_none_wa=
lk(struct vm_area_struct *vma, unsigned long start,=0A+			   unsigned long =
end, unsigned long newflags)=0A+{=0A+	pgprot_t new_pgprot =3D vm_get_page_p=
rot(newflags);=0A+	struct mm_walk prot_none_walk =3D {=0A+		.pte_entry =3D =
prot_none_pte_entry,=0A+		.hugetlb_entry =3D prot_none_hugetlb_entry,=0A+		=
=2Etest_walk =3D prot_none_test,=0A+		.mm =3D current->mm,=0A+		.private =
=3D &new_pgprot,=0A+	};=0A+=0A+	return walk_page_range(start, end, &prot_no=
ne_walk);=0A+}=0A+=0A int=0A mprotect_fixup(struct vm_area_struct *vma, str=
uct vm_area_struct **pprev,=0A 	unsigned long start, unsigned long end, uns=
igned long newflags)=0A@@ -323,6 +359,19 @@ mprotect_fixup(struct vm_area_s=
truct *vma, struct vm_area_struct **pprev,=0A 		return 0;=0A 	}=0A =0A+	/*=
=0A+	 * Do PROT_NONE PFN permission checks here when we can still=0A+	 * ba=
il out without undoing a lot of state. This is a rather=0A+	 * uncommon cas=
e, so doesn't need to be very optimized.=0A+	 */=0A+	if (arch_has_pfn_modif=
y_check() &&=0A+	    (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&=0A+	    (=
newflags & (VM_READ|VM_WRITE|VM_EXEC)) =3D=3D 0) {=0A+		error =3D prot_none=
_walk(vma, start, end, newflags);=0A+		if (error)=0A+			return error;=0A+	}=
=0A+=0A 	/*=0A 	 * If we make a private mapping writable we increase our co=
mmit;=0A 	 * but (without finer accounting) cannot reduce our commit if we=
=0A-- =0A2.14.3=0A=0A
--mP3DRpeJDSE+ciuQ--