* [4.4] broken conversion from efi to kernel page table
@ 2018-01-11 19:07 Pavel Tatashin
2018-01-11 19:07 ` [4.4] x86/pti/efi: " Pavel Tatashin
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Pavel Tatashin @ 2018-01-11 19:07 UTC (permalink / raw)
To: steven.sistare, linux-kernel, tglx, mingo, hpa, x86, gregkh,
jkosina, hughd, dave.hansen, luto, torvalds
This fixes boot panics, hangs which I reported in this thread:
http://lkml.iu.edu/hypermail/linux/kernel/1801.1/00951.html
I have not yet verified if similar issue is applicable to newer
kernels.
Re-sending it as a proper git send-email submission.
Pavel Tatashin (1):
x86/pti/efi: broken conversion from efi to kernel page table
arch/x86/include/asm/kaiser.h | 8 ++++++++
arch/x86/realmode/init.c | 4 +++-
arch/x86/realmode/rm/trampoline_64.S | 3 ++-
3 files changed, 13 insertions(+), 2 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [4.4] x86/pti/efi: broken conversion from efi to kernel page table 2018-01-11 19:07 [4.4] broken conversion from efi to kernel page table Pavel Tatashin @ 2018-01-11 19:07 ` Pavel Tatashin 2018-01-13 13:15 ` Patch "x86/pti/efi: broken conversion from efi to kernel page table" has been added to the 4.4-stable tree gregkh 2018-01-12 10:04 ` [4.4] broken conversion from efi to kernel page table Jiri Kosina 2018-01-13 13:18 ` Greg KH 2 siblings, 1 reply; 8+ messages in thread From: Pavel Tatashin @ 2018-01-11 19:07 UTC (permalink / raw) To: steven.sistare, linux-kernel, tglx, mingo, hpa, x86, gregkh, jkosina, hughd, dave.hansen, luto, torvalds In entry_64.S we have code like this: /* Unconditionally use kernel CR3 for do_nmi() */ /* %rax is saved above, so OK to clobber here */ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER /* If PCID enabled, NOFLUSH now and NOFLUSH on return */ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID pushq %rax /* mask off "user" bit of pgd address and 12 PCID bits: */ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax movq %rax, %cr3 2: /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ call do_nmi With this instruction: andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax We unconditionally switch from whatever our CR3 was to kernel page table. But, in arch/x86/platform/efi/efi_64.c We temporarily set a different page table, that does not have the kernel page table with 0x1000 offset from it. Look in efi_thunk() and efi_thunk_set_virtual_address_map(). So, while CR3 points to the other page table, we get an NMI interrupt, and clear 0x1000 from CR3, resulting in a bogus CR3 if the 0x1000 bit was set. The efi page table comes from realmode/rm/trampoline_64.S: arch/x86/realmode/rm/trampoline_64.S 141 .bss 142 .balign PAGE_SIZE 143 GLOBAL(trampoline_pgd) .space PAGE_SIZE Notice: alignment is PAGE_SIZE, so after applying KAISER_SHADOW_PGD_OFFSET which equal to PAGE_SIZE, we can get a different page table. But, even if we fix alignment, here the trampoline binary is later copied into dynamically allocated memory in reserve_real_mode(), so we need to fix that place as well. Fixes: 8a43ddfb93a0 ("KAISER: Kernel Address Isolation") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> --- arch/x86/include/asm/kaiser.h | 8 ++++++++ arch/x86/realmode/init.c | 4 +++- arch/x86/realmode/rm/trampoline_64.S | 3 ++- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h index 802bbbdfe143..e087bd7a8d29 100644 --- a/arch/x86/include/asm/kaiser.h +++ b/arch/x86/include/asm/kaiser.h @@ -19,6 +19,12 @@ #define KAISER_SHADOW_PGD_OFFSET 0x1000 +/* + * A page table address must have this alignment to stay the same when + * KAISER_SHADOW_PGD_OFFSET mask is applied + */ +#define KAISER_KERNEL_PGD_ALIGNMENT (KAISER_SHADOW_PGD_OFFSET << 1) + #ifdef __ASSEMBLY__ #ifdef CONFIG_PAGE_TABLE_ISOLATION @@ -71,6 +77,8 @@ movq PER_CPU_VAR(unsafe_stack_register_backup), %rax #else /* CONFIG_PAGE_TABLE_ISOLATION */ +#define KAISER_KERNEL_PGD_ALIGNMENT PAGE_SIZE + .macro SWITCH_KERNEL_CR3 .endm .macro SWITCH_USER_CR3 diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 0b7a63d98440..cfecb7d6c6a8 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -1,5 +1,6 @@ #include <linux/io.h> #include <linux/memblock.h> +#include <linux/kaiser.h> #include <asm/cacheflush.h> #include <asm/pgtable.h> @@ -15,7 +16,8 @@ void __init reserve_real_mode(void) size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob); /* Has to be under 1M so we can execute real-mode AP code. */ - mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); + mem = memblock_find_in_range(0, 1 << 20, size, + KAISER_KERNEL_PGD_ALIGNMENT); if (!mem) panic("Cannot allocate trampoline\n"); diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index dac7b20d2f9d..781cca63f795 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -30,6 +30,7 @@ #include <asm/msr.h> #include <asm/segment.h> #include <asm/processor-flags.h> +#include <asm/kaiser.h> #include "realmode.h" .text @@ -139,7 +140,7 @@ tr_gdt: tr_gdt_end: .bss - .balign PAGE_SIZE + .balign KAISER_KERNEL_PGD_ALIGNMENT GLOBAL(trampoline_pgd) .space PAGE_SIZE .balign 8 -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Patch "x86/pti/efi: broken conversion from efi to kernel page table" has been added to the 4.4-stable tree 2018-01-11 19:07 ` [4.4] x86/pti/efi: " Pavel Tatashin @ 2018-01-13 13:15 ` gregkh 0 siblings, 0 replies; 8+ messages in thread From: gregkh @ 2018-01-13 13:15 UTC (permalink / raw) To: pasha.tatashin, gregkh, steven.sistare; +Cc: stable, stable-commits This is a note to let you know that I've just added the patch titled x86/pti/efi: broken conversion from efi to kernel page table to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: x86-pti-efi-broken-conversion-from-efi-to-kernel-page-table.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@vger.kernel.org> know about it. >From pasha.tatashin@oracle.com Sat Jan 13 14:14:57 2018 From: Pavel Tatashin <pasha.tatashin@oracle.com> Date: Thu, 11 Jan 2018 14:07:46 -0500 Subject: x86/pti/efi: broken conversion from efi to kernel page table To: steven.sistare@oracle.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, gregkh@linuxfoundation.org, jkosina@suse.cz, hughd@google.com, dave.hansen@linux.intel.com, luto@kernel.org, torvalds@linux-foundation.org Message-ID: <20180111190746.15426-2-pasha.tatashin@oracle.com> From: Pavel Tatashin <pasha.tatashin@oracle.com> In entry_64.S we have code like this: /* Unconditionally use kernel CR3 for do_nmi() */ /* %rax is saved above, so OK to clobber here */ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER /* If PCID enabled, NOFLUSH now and NOFLUSH on return */ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID pushq %rax /* mask off "user" bit of pgd address and 12 PCID bits: */ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax movq %rax, %cr3 2: /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ call do_nmi With this instruction: andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax We unconditionally switch from whatever our CR3 was to kernel page table. But, in arch/x86/platform/efi/efi_64.c We temporarily set a different page table, that does not have the kernel page table with 0x1000 offset from it. Look in efi_thunk() and efi_thunk_set_virtual_address_map(). So, while CR3 points to the other page table, we get an NMI interrupt, and clear 0x1000 from CR3, resulting in a bogus CR3 if the 0x1000 bit was set. The efi page table comes from realmode/rm/trampoline_64.S: arch/x86/realmode/rm/trampoline_64.S 141 .bss 142 .balign PAGE_SIZE 143 GLOBAL(trampoline_pgd) .space PAGE_SIZE Notice: alignment is PAGE_SIZE, so after applying KAISER_SHADOW_PGD_OFFSET which equal to PAGE_SIZE, we can get a different page table. But, even if we fix alignment, here the trampoline binary is later copied into dynamically allocated memory in reserve_real_mode(), so we need to fix that place as well. Fixes: 8a43ddfb93a0 ("KAISER: Kernel Address Isolation") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- arch/x86/include/asm/kaiser.h | 8 ++++++++ arch/x86/realmode/init.c | 4 +++- arch/x86/realmode/rm/trampoline_64.S | 3 ++- 3 files changed, 13 insertions(+), 2 deletions(-) --- a/arch/x86/include/asm/kaiser.h +++ b/arch/x86/include/asm/kaiser.h @@ -19,6 +19,12 @@ #define KAISER_SHADOW_PGD_OFFSET 0x1000 +/* + * A page table address must have this alignment to stay the same when + * KAISER_SHADOW_PGD_OFFSET mask is applied + */ +#define KAISER_KERNEL_PGD_ALIGNMENT (KAISER_SHADOW_PGD_OFFSET << 1) + #ifdef __ASSEMBLY__ #ifdef CONFIG_PAGE_TABLE_ISOLATION @@ -71,6 +77,8 @@ movq PER_CPU_VAR(unsafe_stack_register_b #else /* CONFIG_PAGE_TABLE_ISOLATION */ +#define KAISER_KERNEL_PGD_ALIGNMENT PAGE_SIZE + .macro SWITCH_KERNEL_CR3 .endm .macro SWITCH_USER_CR3 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -1,5 +1,6 @@ #include <linux/io.h> #include <linux/memblock.h> +#include <linux/kaiser.h> #include <asm/cacheflush.h> #include <asm/pgtable.h> @@ -15,7 +16,8 @@ void __init reserve_real_mode(void) size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob); /* Has to be under 1M so we can execute real-mode AP code. */ - mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); + mem = memblock_find_in_range(0, 1 << 20, size, + KAISER_KERNEL_PGD_ALIGNMENT); if (!mem) panic("Cannot allocate trampoline\n"); --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -30,6 +30,7 @@ #include <asm/msr.h> #include <asm/segment.h> #include <asm/processor-flags.h> +#include <asm/kaiser.h> #include "realmode.h" .text @@ -139,7 +140,7 @@ tr_gdt: tr_gdt_end: .bss - .balign PAGE_SIZE + .balign KAISER_KERNEL_PGD_ALIGNMENT GLOBAL(trampoline_pgd) .space PAGE_SIZE .balign 8 Patches currently in stable-queue which might be from pasha.tatashin@oracle.com are queue-4.4/x86-pti-efi-broken-conversion-from-efi-to-kernel-page-table.patch ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4] broken conversion from efi to kernel page table 2018-01-11 19:07 [4.4] broken conversion from efi to kernel page table Pavel Tatashin 2018-01-11 19:07 ` [4.4] x86/pti/efi: " Pavel Tatashin @ 2018-01-12 10:04 ` Jiri Kosina 2018-01-12 13:52 ` Pavel Tatashin 2018-01-13 13:18 ` Greg KH 2 siblings, 1 reply; 8+ messages in thread From: Jiri Kosina @ 2018-01-12 10:04 UTC (permalink / raw) To: Pavel Tatashin Cc: steven.sistare, linux-kernel, tglx, mingo, hpa, x86, gregkh, hughd, dave.hansen, luto, torvalds On Thu, 11 Jan 2018, Pavel Tatashin wrote: > This fixes boot panics, hangs which I reported in this thread: Are you using EFI old_memmap? If so, then https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/pti&id=de53c3786a3ce162a1c815d0c04c766c23ec9c0a ("x86/pti: Unbreak EFI old_memmap") should be the cure. If not, you might be missing this in in arch/x86/platform/efi/efi_64.c - efi_pgd = (pgd_t *)__get_free_page(gfp_mask); + efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); Could you please let me know if either of this fixes your problem? Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4] broken conversion from efi to kernel page table 2018-01-12 10:04 ` [4.4] broken conversion from efi to kernel page table Jiri Kosina @ 2018-01-12 13:52 ` Pavel Tatashin 0 siblings, 0 replies; 8+ messages in thread From: Pavel Tatashin @ 2018-01-12 13:52 UTC (permalink / raw) To: Jiri Kosina Cc: Steve Sistare, Linux Kernel Mailing List, Thomas Gleixner, mingo, hpa, x86, Greg Kroah-Hartman, Hugh Dickins, dave.hansen, Andy Lutomirski, Linus Torvalds Hi Jiri, This patch solves a different problem than _PAGE_NX. The problem it solves is similar to + efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);, Where after applying PTI mask CR3 can point to invalid page table. But, this patch is specific to stable 4.4.110 since it uses trampoline_pgd from realmode, instead of allocating its own page table as 4.6 and later do. Thank you, Pavel On Fri, Jan 12, 2018 at 5:04 AM, Jiri Kosina <jikos@kernel.org> wrote: > On Thu, 11 Jan 2018, Pavel Tatashin wrote: > >> This fixes boot panics, hangs which I reported in this thread: > > Are you using EFI old_memmap? > > If so, then > > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/pti&id=de53c3786a3ce162a1c815d0c04c766c23ec9c0a > > ("x86/pti: Unbreak EFI old_memmap") should be the cure. > > If not, you might be missing this in in arch/x86/platform/efi/efi_64.c > > - efi_pgd = (pgd_t *)__get_free_page(gfp_mask); > + efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); > > Could you please let me know if either of this fixes your problem? > > Thanks, > > -- > Jiri Kosina > SUSE Labs > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4] broken conversion from efi to kernel page table 2018-01-11 19:07 [4.4] broken conversion from efi to kernel page table Pavel Tatashin 2018-01-11 19:07 ` [4.4] x86/pti/efi: " Pavel Tatashin 2018-01-12 10:04 ` [4.4] broken conversion from efi to kernel page table Jiri Kosina @ 2018-01-13 13:18 ` Greg KH 2018-01-13 13:35 ` Pavel Tatashin 2 siblings, 1 reply; 8+ messages in thread From: Greg KH @ 2018-01-13 13:18 UTC (permalink / raw) To: Pavel Tatashin Cc: steven.sistare, linux-kernel, tglx, mingo, hpa, x86, jkosina, hughd, dave.hansen, luto, torvalds On Thu, Jan 11, 2018 at 02:07:45PM -0500, Pavel Tatashin wrote: > This fixes boot panics, hangs which I reported in this thread: > > http://lkml.iu.edu/hypermail/linux/kernel/1801.1/00951.html > > I have not yet verified if similar issue is applicable to newer > kernels. > > Re-sending it as a proper git send-email submission. > > Pavel Tatashin (1): > x86/pti/efi: broken conversion from efi to kernel page table > > arch/x86/include/asm/kaiser.h | 8 ++++++++ > arch/x86/realmode/init.c | 4 +++- > arch/x86/realmode/rm/trampoline_64.S | 3 ++- > 3 files changed, 13 insertions(+), 2 deletions(-) Many thanks for this work, and the patch, it's greatly appreciated. Now queued up, thanks. greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4] broken conversion from efi to kernel page table 2018-01-13 13:18 ` Greg KH @ 2018-01-13 13:35 ` Pavel Tatashin 2018-01-13 13:53 ` Greg KH 0 siblings, 1 reply; 8+ messages in thread From: Pavel Tatashin @ 2018-01-13 13:35 UTC (permalink / raw) To: Greg KH Cc: Steve Sistare, Linux Kernel Mailing List, Thomas Gleixner, mingo, hpa, x86, Jiri Kosina, Hugh Dickins, dave.hansen, Andy Lutomirski, Linus Torvalds Hi Greg, Make sure you apply: [PATCH 4.4 v2] x86/pti/efi: broken conversion from efi to kernel page table It is the same patch but fixes a compiling issue when compiled without: CONFIG_PAGE_TABLE_ISOLATION On Sat, Jan 13, 2018 at 8:18 AM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Thu, Jan 11, 2018 at 02:07:45PM -0500, Pavel Tatashin wrote: >> This fixes boot panics, hangs which I reported in this thread: >> >> http://lkml.iu.edu/hypermail/linux/kernel/1801.1/00951.html >> >> I have not yet verified if similar issue is applicable to newer >> kernels. >> >> Re-sending it as a proper git send-email submission. >> >> Pavel Tatashin (1): >> x86/pti/efi: broken conversion from efi to kernel page table >> >> arch/x86/include/asm/kaiser.h | 8 ++++++++ >> arch/x86/realmode/init.c | 4 +++- >> arch/x86/realmode/rm/trampoline_64.S | 3 ++- >> 3 files changed, 13 insertions(+), 2 deletions(-) > > Many thanks for this work, and the patch, it's greatly appreciated. > > Now queued up, thanks. > > greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4] broken conversion from efi to kernel page table 2018-01-13 13:35 ` Pavel Tatashin @ 2018-01-13 13:53 ` Greg KH 0 siblings, 0 replies; 8+ messages in thread From: Greg KH @ 2018-01-13 13:53 UTC (permalink / raw) To: Pavel Tatashin Cc: Steve Sistare, Linux Kernel Mailing List, Thomas Gleixner, mingo, hpa, x86, Jiri Kosina, Hugh Dickins, dave.hansen, Andy Lutomirski, Linus Torvalds On Sat, Jan 13, 2018 at 08:35:05AM -0500, Pavel Tatashin wrote: > Hi Greg, > > Make sure you apply: > [PATCH 4.4 v2] x86/pti/efi: broken conversion from efi to kernel page table > > It is the same patch but fixes a compiling issue when compiled > without: CONFIG_PAGE_TABLE_ISOLATION Ah, thanks, now fixed up. In the future, any stable patches should be at least cc: stable@vger.kernel.org, which is why I missed the v2 as it was burried in my inbox. I'm still trying to dig all of the crap out of there at the moment, it's a mess... thanks again, greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-01-13 13:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-11 19:07 [4.4] broken conversion from efi to kernel page table Pavel Tatashin 2018-01-11 19:07 ` [4.4] x86/pti/efi: " Pavel Tatashin 2018-01-13 13:15 ` Patch "x86/pti/efi: broken conversion from efi to kernel page table" has been added to the 4.4-stable tree gregkh 2018-01-12 10:04 ` [4.4] broken conversion from efi to kernel page table Jiri Kosina 2018-01-12 13:52 ` Pavel Tatashin 2018-01-13 13:18 ` Greg KH 2018-01-13 13:35 ` Pavel Tatashin 2018-01-13 13:53 ` Greg KH
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.