From: Borislav Petkov <bp@alien8.de>
To: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Alexander van Heukelum <heukelum@fastmail.fm>,
Andy Lutomirski <amluto@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Arjan van de Ven <arjan.van.de.ven@intel.com>,
Brian Gerst <brgerst@gmail.com>,
Alexandre Julliard <julliard@winehq.com>,
Andi Kleen <andi@firstfloor.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*
Date: Tue, 22 Apr 2014 13:25:50 +0200 [thread overview]
Message-ID: <20140422112550.GC15882@pd.tnic> (raw)
In-Reply-To: <1398120472-6190-1-git-send-email-hpa@linux.intel.com>
Just nitpicks below:
On Mon, Apr 21, 2014 at 03:47:52PM -0700, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the 32-bit kernel, but that
> implementation doesn't work for 64 bits.
>
> The 64-bit implementation works like this:
>
> Set up a ministack for each CPU, which is then mapped 65536 times
> using the page tables. This implementation uses the second-to-last
> PGD slot for this; with a 64-byte espfix stack this is sufficient for
> 2^18 CPUs (currently we support a max of 2^13 CPUs.)
I wish we'd put this description in the code instead of in a commit
message as those can get lost in git history over time.
> 64 bytes appear to be sufficient, because NMI and #MC cause a task
> switch.
>
> THIS IS A PROTOTYPE AND IS NOT COMPLETE. We need to make sure all
> code paths that can interrupt userspace execute this code.
> Fortunately we never need to use the espfix stack for nested faults,
> so one per CPU is guaranteed to be safe.
>
> Furthermore, this code adds unnecessary instructions to the common
> path. For example, on exception entry we push %rdi, pop %rdi, and
> then save away %rdi. Ideally we should do this in such a way that we
> avoid unnecessary swapgs, especially on the IRET path (the exception
> path is going to be very rare, and so is less critical.)
>
> Putting this version out there for people to look at/laugh at/play
> with.
>
> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> Link: http://lkml.kernel.org/r/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alexander van Heukelum <heukelum@fastmail.fm>
> Cc: Andy Lutomirski <amluto@gmail.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Alexandre Julliard <julliard@winehq.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
...
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1e96c3628bf2..7cc01770bf21 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -58,6 +58,7 @@
> #include <asm/asm.h>
> #include <asm/context_tracking.h>
> #include <asm/smap.h>
> +#include <asm/pgtable_types.h>
> #include <linux/err.h>
>
> /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
> @@ -1040,8 +1041,16 @@ restore_args:
> RESTORE_ARGS 1,8,1
>
> irq_return:
> + /*
> + * Are we returning to the LDT? Note: in 64-bit mode
> + * SS:RSP on the exception stack is always valid.
> + */
> + testb $4,(SS-RIP)(%rsp)
> + jnz irq_return_ldt
> +
> +irq_return_iret:
> INTERRUPT_RETURN
> - _ASM_EXTABLE(irq_return, bad_iret)
> + _ASM_EXTABLE(irq_return_iret, bad_iret)
>
> #ifdef CONFIG_PARAVIRT
> ENTRY(native_iret)
> @@ -1049,6 +1058,34 @@ ENTRY(native_iret)
> _ASM_EXTABLE(native_iret, bad_iret)
> #endif
>
> +irq_return_ldt:
> + pushq_cfi %rcx
> + larl (CS-RIP+8)(%rsp), %ecx
> + jnz 1f /* Invalid segment - will #GP at IRET time */
> + testl $0x00200000, %ecx
> + jnz 1f /* Returning to 64-bit mode */
> + larl (SS-RIP+8)(%rsp), %ecx
> + jnz 1f /* Invalid segment - will #SS at IRET time */
You mean " ... will #GP at IRET time"? But you're right, you're looking
at SS :-)
> + testl $0x00400000, %ecx
> + jnz 1f /* Not a 16-bit stack segment */
> + pushq_cfi %rsi
> + pushq_cfi %rdi
> + SWAPGS
> + movq PER_CPU_VAR(espfix_stack),%rdi
> + movl (RSP-RIP+3*8)(%rsp),%esi
> + xorw %si,%si
> + orq %rsi,%rdi
> + movq %rsp,%rsi
> + movl $8,%ecx
> + rep;movsq
> + leaq -(8*8)(%rdi),%rsp
> + SWAPGS
> + popq_cfi %rdi
> + popq_cfi %rsi
> +1:
> + popq_cfi %rcx
> + jmp irq_return_iret
> +
> .section .fixup,"ax"
> bad_iret:
> /*
...
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 85126ccbdf6b..dc2d8afcafe9 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -32,6 +32,7 @@
> * Manage page tables very early on.
> */
> extern pgd_t early_level4_pgt[PTRS_PER_PGD];
> +extern pud_t espfix_pud_page[PTRS_PER_PUD];
I guess you don't need the "extern" here.
> extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
> static unsigned int __initdata next_early_pgt = 2;
> pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
> diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
> index af1d14a9ebda..ebc987398923 100644
> --- a/arch/x86/kernel/ldt.c
> +++ b/arch/x86/kernel/ldt.c
> @@ -229,17 +229,6 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
> }
> }
>
> - /*
> - * On x86-64 we do not support 16-bit segments due to
> - * IRET leaking the high bits of the kernel stack address.
> - */
> -#ifdef CONFIG_X86_64
> - if (!ldt_info.seg_32bit) {
> - error = -EINVAL;
> - goto out_unlock;
> - }
> -#endif
> -
> fill_ldt(&ldt, &ldt_info);
> if (oldmode)
> ldt.avl = 0;
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 34826934d4a7..ff32efb14e33 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -244,6 +244,11 @@ static void notrace start_secondary(void *unused)
> check_tsc_sync_target();
>
> /*
> + * Enable the espfix hack for this CPU
> + */
> + init_espfix_cpu();
> +
> + /*
> * We need to hold vector_lock so there the set of online cpus
> * does not change while we are assigning vectors to cpus. Holding
> * this lock ensures we don't half assign or remove an irq from a cpu.
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 20621d753d5f..96bf767a05fc 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -327,6 +327,8 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
> int i;
> struct pg_state st = {};
>
> + st.to_dmesg = true;
Right, remove before applying :)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2014-04-22 11:26 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 17:36 [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels tip-bot for H. Peter Anvin
2014-04-11 18:12 ` Andy Lutomirski
2014-04-11 18:20 ` H. Peter Anvin
2014-04-11 18:27 ` Brian Gerst
2014-04-11 18:29 ` H. Peter Anvin
2014-04-11 18:35 ` Brian Gerst
2014-04-11 21:16 ` Andy Lutomirski
2014-04-11 21:24 ` H. Peter Anvin
2014-04-11 21:53 ` Andy Lutomirski
2014-04-11 21:59 ` H. Peter Anvin
2014-04-11 22:15 ` Andy Lutomirski
2014-04-11 22:18 ` H. Peter Anvin
2014-04-13 4:20 ` H. Peter Anvin
2014-04-12 23:26 ` Alexander van Heukelum
2014-04-12 23:31 ` H. Peter Anvin
2014-04-12 23:49 ` Alexander van Heukelum
2014-04-13 0:03 ` H. Peter Anvin
2014-04-13 1:25 ` Andy Lutomirski
2014-04-13 1:29 ` Andy Lutomirski
2014-04-13 3:00 ` H. Peter Anvin
2014-04-11 21:34 ` Linus Torvalds
2014-04-11 18:41 ` Linus Torvalds
2014-04-11 18:45 ` Brian Gerst
2014-04-11 18:50 ` Linus Torvalds
2014-04-12 4:44 ` Brian Gerst
2014-04-12 17:18 ` H. Peter Anvin
2014-04-12 19:35 ` Borislav Petkov
2014-04-12 19:44 ` H. Peter Anvin
2014-04-12 20:11 ` Borislav Petkov
2014-04-12 20:34 ` Brian Gerst
2014-04-12 20:59 ` Borislav Petkov
2014-04-12 21:13 ` Brian Gerst
2014-04-12 21:40 ` Borislav Petkov
2014-04-14 7:21 ` Ingo Molnar
2014-04-14 9:44 ` Borislav Petkov
2014-04-14 9:47 ` Ingo Molnar
2014-04-12 21:53 ` Linus Torvalds
2014-04-12 22:25 ` H. Peter Anvin
2014-04-13 2:56 ` Andi Kleen
2014-04-13 3:02 ` H. Peter Anvin
2014-04-13 3:13 ` Linus Torvalds
2014-04-12 20:29 ` Brian Gerst
2014-04-14 7:48 ` Alexandre Julliard
2014-05-07 9:18 ` Sven Joachim
2014-05-07 10:18 ` Borislav Petkov
2014-05-07 16:57 ` Linus Torvalds
2014-05-07 17:09 ` H. Peter Anvin
2014-05-07 17:50 ` Alexandre Julliard
2014-05-08 6:43 ` Sven Joachim
2014-05-08 13:50 ` H. Peter Anvin
2014-05-08 20:13 ` H. Peter Anvin
2014-05-08 20:40 ` H. Peter Anvin
2014-05-12 13:16 ` Josh Boyer
2014-05-12 16:52 ` H. Peter Anvin
2014-05-14 23:43 ` [tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option tip-bot for Linus Torvalds
2014-04-11 18:46 ` [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels H. Peter Anvin
2014-04-14 7:27 ` Ingo Molnar
2014-04-14 15:45 ` H. Peter Anvin
2014-04-13 2:54 ` Andi Kleen
2014-04-21 22:47 ` [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE* H. Peter Anvin
2014-04-21 23:19 ` Andrew Lutomirski
2014-04-21 23:29 ` H. Peter Anvin
2014-04-22 0:37 ` Andrew Lutomirski
2014-04-22 0:53 ` H. Peter Anvin
2014-04-22 1:06 ` Andrew Lutomirski
2014-04-22 1:14 ` H. Peter Anvin
2014-04-22 1:28 ` Andrew Lutomirski
2014-04-22 1:47 ` H. Peter Anvin
2014-04-22 1:53 ` Andrew Lutomirski
2014-04-22 11:23 ` Borislav Petkov
2014-04-22 14:46 ` Borislav Petkov
2014-04-22 16:03 ` Andrew Lutomirski
2014-04-22 16:10 ` H. Peter Anvin
2014-04-22 16:33 ` Andrew Lutomirski
2014-04-22 16:43 ` Linus Torvalds
2014-04-22 17:00 ` Andrew Lutomirski
2014-04-22 17:04 ` Linus Torvalds
2014-04-22 17:11 ` Andrew Lutomirski
2014-04-22 17:15 ` H. Peter Anvin
2014-04-23 9:54 ` One Thousand Gnomes
2014-04-23 15:53 ` H. Peter Anvin
2014-04-23 17:08 ` Andrew Lutomirski
2014-04-23 17:16 ` H. Peter Anvin
2014-04-23 17:25 ` Andrew Lutomirski
2014-04-23 17:28 ` H. Peter Anvin
2014-04-23 17:45 ` Andrew Lutomirski
2014-04-22 17:19 ` Linus Torvalds
2014-04-22 17:29 ` H. Peter Anvin
2014-04-22 17:46 ` Andrew Lutomirski
2014-04-22 17:59 ` H. Peter Anvin
2014-04-22 18:03 ` Brian Gerst
2014-04-22 18:06 ` H. Peter Anvin
2014-04-22 18:17 ` Brian Gerst
2014-04-22 18:51 ` H. Peter Anvin
2014-04-22 19:55 ` Brian Gerst
2014-04-22 20:17 ` H. Peter Anvin
2014-04-22 23:08 ` Brian Gerst
2014-04-22 23:39 ` Andi Kleen
2014-04-22 23:40 ` H. Peter Anvin
2014-04-22 17:11 ` H. Peter Anvin
2014-04-22 17:26 ` Borislav Petkov
2014-04-22 17:29 ` Andrew Lutomirski
2014-04-22 19:27 ` Borislav Petkov
2014-04-23 6:24 ` H. Peter Anvin
2014-04-23 8:57 ` Alexandre Julliard
2014-04-22 17:09 ` H. Peter Anvin
2014-04-22 17:20 ` Andrew Lutomirski
2014-04-22 17:24 ` H. Peter Anvin
2014-04-22 11:25 ` Borislav Petkov [this message]
2014-04-23 1:17 ` H. Peter Anvin
2014-04-23 1:23 ` Andrew Lutomirski
2014-04-23 1:42 ` H. Peter Anvin
2014-04-23 14:24 ` Boris Ostrovsky
2014-04-23 16:56 ` H. Peter Anvin
2014-04-28 13:04 ` Konrad Rzeszutek Wilk
2014-04-25 21:02 ` Konrad Rzeszutek Wilk
2014-04-25 21:16 ` H. Peter Anvin
2014-04-24 4:13 ` comex
2014-04-24 4:53 ` Andrew Lutomirski
2014-04-24 22:24 ` H. Peter Anvin
2014-04-24 22:31 ` Andrew Lutomirski
2014-04-24 22:37 ` H. Peter Anvin
2014-04-24 22:43 ` Andrew Lutomirski
2014-04-28 23:05 ` H. Peter Anvin
2014-04-28 23:08 ` H. Peter Anvin
2014-04-29 0:02 ` Andrew Lutomirski
2014-04-29 0:15 ` H. Peter Anvin
2014-04-29 0:20 ` Andrew Lutomirski
2014-04-29 2:38 ` H. Peter Anvin
2014-04-29 2:44 ` H. Peter Anvin
2014-04-29 3:45 ` H. Peter Anvin
2014-04-29 3:47 ` H. Peter Anvin
2014-04-29 4:36 ` H. Peter Anvin
2014-04-29 7:14 ` H. Peter Anvin
2014-04-25 12:02 ` Pavel Machek
2014-04-25 21:20 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140422112550.GC15882@pd.tnic \
--to=bp@alien8.de \
--cc=amluto@gmail.com \
--cc=andi@firstfloor.org \
--cc=arjan.van.de.ven@intel.com \
--cc=boris.ostrovsky@oracle.com \
--cc=brgerst@gmail.com \
--cc=heukelum@fastmail.fm \
--cc=hpa@linux.intel.com \
--cc=hpa@zytor.com \
--cc=julliard@winehq.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.