From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "H. Peter Anvin" <hpa@linux.intel.com>, boris.ostrovsky@oracle.com
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Alexander van Heukelum <heukelum@fastmail.fm>,
Andy Lutomirski <amluto@gmail.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Borislav Petkov <bp@alien8.de>,
Arjan van de Ven <arjan.van.de.ven@intel.com>,
Brian Gerst <brgerst@gmail.com>,
Alexandre Julliard <julliard@winehq.com>,
Andi Kleen <andi@firstfloor.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*
Date: Fri, 25 Apr 2014 17:02:41 -0400 [thread overview]
Message-ID: <20140425210241.GB30532@phenom.dumpdata.com> (raw)
In-Reply-To: <535714A1.9080502@linux.intel.com>
On Tue, Apr 22, 2014 at 06:17:21PM -0700, H. Peter Anvin wrote:
> Another spin of the prototype. This one avoids the espfix for anything
> but #GP, and avoids save/restore/saving registers... one can wonder,
> though, how much that actually matters in practice.
>
> It still does redundant SWAPGS on the slow path. I'm not sure I
> personally care enough to optimize that, as it means some fairly
> significant restructuring of some of the code paths. Some of that
> restructuring might actually be beneficial, but still...
Sorry about being late to the party.
.. snip..
> diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
> new file mode 100644
> index 000000000000..05567d706f92
> --- /dev/null
> +++ b/arch/x86/kernel/espfix_64.c
> @@ -0,0 +1,136 @@
> +/* ----------------------------------------------------------------------- *
> + *
> + * Copyright 2014 Intel Corporation; author: H. Peter Anvin
> + *
> + * This file is part of the Linux kernel, and is made available under
> + * the terms of the GNU General Public License version 2 or (at your
> + * option) any later version; incorporated herein by reference.
> + *
> + * ----------------------------------------------------------------------- */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/percpu.h>
> +#include <linux/gfp.h>
> +#include <asm/pgtable.h>
> +
> +#define ESPFIX_STACK_SIZE 64UL
> +#define ESPFIX_STACKS_PER_PAGE (PAGE_SIZE/ESPFIX_STACK_SIZE)
> +
> +#define ESPFIX_MAX_CPUS (ESPFIX_STACKS_PER_PAGE << (PGDIR_SHIFT-PAGE_SHIFT-16))
> +#if CONFIG_NR_CPUS > ESPFIX_MAX_CPUS
> +# error "Need more than one PGD for the ESPFIX hack"
> +#endif
> +
> +#define ESPFIX_BASE_ADDR (-2UL << PGDIR_SHIFT)
> +
> +#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
> +
> +/* This contains the *bottom* address of the espfix stack */
> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack);
> +
> +/* Initialization mutex - should this be a spinlock? */
> +static DEFINE_MUTEX(espfix_init_mutex);
> +
> +/* Page allocation bitmap - each page serves ESPFIX_STACKS_PER_PAGE CPUs */
> +#define ESPFIX_MAX_PAGES DIV_ROUND_UP(CONFIG_NR_CPUS, ESPFIX_STACKS_PER_PAGE)
> +#define ESPFIX_MAP_SIZE DIV_ROUND_UP(ESPFIX_MAX_PAGES, BITS_PER_LONG)
> +static unsigned long espfix_page_alloc_map[ESPFIX_MAP_SIZE];
> +
> +static __page_aligned_bss pud_t espfix_pud_page[PTRS_PER_PUD]
> + __aligned(PAGE_SIZE);
> +
> +/*
> + * This returns the bottom address of the espfix stack for a specific CPU.
> + * The math allows for a non-power-of-two ESPFIX_STACK_SIZE, in which case
> + * we have to account for some amount of padding at the end of each page.
> + */
> +static inline unsigned long espfix_base_addr(unsigned int cpu)
> +{
> + unsigned long page, addr;
> +
> + page = (cpu / ESPFIX_STACKS_PER_PAGE) << PAGE_SHIFT;
> + addr = page + (cpu % ESPFIX_STACKS_PER_PAGE) * ESPFIX_STACK_SIZE;
> + addr = (addr & 0xffffUL) | ((addr & ~0xffffUL) << 16);
> + addr += ESPFIX_BASE_ADDR;
> + return addr;
> +}
> +
> +#define PTE_STRIDE (65536/PAGE_SIZE)
> +#define ESPFIX_PTE_CLONES (PTRS_PER_PTE/PTE_STRIDE)
> +#define ESPFIX_PMD_CLONES PTRS_PER_PMD
> +#define ESPFIX_PUD_CLONES (65536/(ESPFIX_PTE_CLONES*ESPFIX_PMD_CLONES))
> +
> +void init_espfix_this_cpu(void)
> +{
> + unsigned int cpu, page;
> + unsigned long addr;
> + pgd_t pgd, *pgd_p;
> + pud_t pud, *pud_p;
> + pmd_t pmd, *pmd_p;
> + pte_t pte, *pte_p;
> + int n;
> + void *stack_page;
> + pteval_t ptemask;
> +
> + /* We only have to do this once... */
> + if (likely(this_cpu_read(espfix_stack)))
> + return; /* Already initialized */
> +
> + cpu = smp_processor_id();
> + addr = espfix_base_addr(cpu);
> + page = cpu/ESPFIX_STACKS_PER_PAGE;
> +
> + /* Did another CPU already set this up? */
> + if (likely(test_bit(page, espfix_page_alloc_map)))
> + goto done;
> +
> + mutex_lock(&espfix_init_mutex);
> +
> + /* Did we race on the lock? */
> + if (unlikely(test_bit(page, espfix_page_alloc_map)))
> + goto unlock_done;
> +
> + ptemask = __supported_pte_mask;
> +
> + pgd_p = &init_level4_pgt[pgd_index(addr)];
> + pgd = *pgd_p;
> + if (!pgd_present(pgd)) {
> + /* This can only happen on the BSP */
> + pgd = __pgd(__pa_symbol(espfix_pud_page) |
Any particular reason you are using __pgd
> + (_KERNPG_TABLE & ptemask));
> + set_pgd(pgd_p, pgd);
> + }
> +
> + pud_p = &espfix_pud_page[pud_index(addr)];
> + pud = *pud_p;
> + if (!pud_present(pud)) {
> + pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP);
> + pud = __pud(__pa(pmd_p) | (_KERNPG_TABLE & ptemask));
_pud
> + for (n = 0; n < ESPFIX_PUD_CLONES; n++)
> + set_pud(&pud_p[n], pud);
> + }
> +
> + pmd_p = pmd_offset(&pud, addr);
> + pmd = *pmd_p;
> + if (!pmd_present(pmd)) {
> + pte_p = (pte_t *)__get_free_page(PGALLOC_GFP);
> + pmd = __pmd(__pa(pte_p) | (_KERNPG_TABLE & ptemask));
and _pmd?
> + for (n = 0; n < ESPFIX_PMD_CLONES; n++)
> + set_pmd(&pmd_p[n], pmd);
> + }
> +
> + pte_p = pte_offset_kernel(&pmd, addr);
> + stack_page = (void *)__get_free_page(GFP_KERNEL);
> + pte = __pte(__pa(stack_page) | (__PAGE_KERNEL & ptemask));
and __pte instead of the 'pmd', 'pud', 'pmd' and 'pte' macros?
> + for (n = 0; n < ESPFIX_PTE_CLONES; n++)
> + set_pte(&pte_p[n*PTE_STRIDE], pte);
> +
> + /* Job is done for this CPU and any CPU which shares this page */
> + set_bit(page, espfix_page_alloc_map);
> +
> +unlock_done:
> + mutex_unlock(&espfix_init_mutex);
> +done:
> + this_cpu_write(espfix_stack, addr);
> +}
next prev parent reply other threads:[~2014-04-25 21:03 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 17:36 [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels tip-bot for H. Peter Anvin
2014-04-11 18:12 ` Andy Lutomirski
2014-04-11 18:20 ` H. Peter Anvin
2014-04-11 18:27 ` Brian Gerst
2014-04-11 18:29 ` H. Peter Anvin
2014-04-11 18:35 ` Brian Gerst
2014-04-11 21:16 ` Andy Lutomirski
2014-04-11 21:24 ` H. Peter Anvin
2014-04-11 21:53 ` Andy Lutomirski
2014-04-11 21:59 ` H. Peter Anvin
2014-04-11 22:15 ` Andy Lutomirski
2014-04-11 22:18 ` H. Peter Anvin
2014-04-13 4:20 ` H. Peter Anvin
2014-04-12 23:26 ` Alexander van Heukelum
2014-04-12 23:31 ` H. Peter Anvin
2014-04-12 23:49 ` Alexander van Heukelum
2014-04-13 0:03 ` H. Peter Anvin
2014-04-13 1:25 ` Andy Lutomirski
2014-04-13 1:29 ` Andy Lutomirski
2014-04-13 3:00 ` H. Peter Anvin
2014-04-11 21:34 ` Linus Torvalds
2014-04-11 18:41 ` Linus Torvalds
2014-04-11 18:45 ` Brian Gerst
2014-04-11 18:50 ` Linus Torvalds
2014-04-12 4:44 ` Brian Gerst
2014-04-12 17:18 ` H. Peter Anvin
2014-04-12 19:35 ` Borislav Petkov
2014-04-12 19:44 ` H. Peter Anvin
2014-04-12 20:11 ` Borislav Petkov
2014-04-12 20:34 ` Brian Gerst
2014-04-12 20:59 ` Borislav Petkov
2014-04-12 21:13 ` Brian Gerst
2014-04-12 21:40 ` Borislav Petkov
2014-04-14 7:21 ` Ingo Molnar
2014-04-14 9:44 ` Borislav Petkov
2014-04-14 9:47 ` Ingo Molnar
2014-04-12 21:53 ` Linus Torvalds
2014-04-12 22:25 ` H. Peter Anvin
2014-04-13 2:56 ` Andi Kleen
2014-04-13 3:02 ` H. Peter Anvin
2014-04-13 3:13 ` Linus Torvalds
2014-04-12 20:29 ` Brian Gerst
2014-04-14 7:48 ` Alexandre Julliard
2014-05-07 9:18 ` Sven Joachim
2014-05-07 10:18 ` Borislav Petkov
2014-05-07 16:57 ` Linus Torvalds
2014-05-07 17:09 ` H. Peter Anvin
2014-05-07 17:50 ` Alexandre Julliard
2014-05-08 6:43 ` Sven Joachim
2014-05-08 13:50 ` H. Peter Anvin
2014-05-08 20:13 ` H. Peter Anvin
2014-05-08 20:40 ` H. Peter Anvin
2014-05-12 13:16 ` Josh Boyer
2014-05-12 16:52 ` H. Peter Anvin
2014-05-14 23:43 ` [tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option tip-bot for Linus Torvalds
2014-04-11 18:46 ` [tip:x86/urgent] x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels H. Peter Anvin
2014-04-14 7:27 ` Ingo Molnar
2014-04-14 15:45 ` H. Peter Anvin
2014-04-13 2:54 ` Andi Kleen
2014-04-21 22:47 ` [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE* H. Peter Anvin
2014-04-21 23:19 ` Andrew Lutomirski
2014-04-21 23:29 ` H. Peter Anvin
2014-04-22 0:37 ` Andrew Lutomirski
2014-04-22 0:53 ` H. Peter Anvin
2014-04-22 1:06 ` Andrew Lutomirski
2014-04-22 1:14 ` H. Peter Anvin
2014-04-22 1:28 ` Andrew Lutomirski
2014-04-22 1:47 ` H. Peter Anvin
2014-04-22 1:53 ` Andrew Lutomirski
2014-04-22 11:23 ` Borislav Petkov
2014-04-22 14:46 ` Borislav Petkov
2014-04-22 16:03 ` Andrew Lutomirski
2014-04-22 16:10 ` H. Peter Anvin
2014-04-22 16:33 ` Andrew Lutomirski
2014-04-22 16:43 ` Linus Torvalds
2014-04-22 17:00 ` Andrew Lutomirski
2014-04-22 17:04 ` Linus Torvalds
2014-04-22 17:11 ` Andrew Lutomirski
2014-04-22 17:15 ` H. Peter Anvin
2014-04-23 9:54 ` One Thousand Gnomes
2014-04-23 15:53 ` H. Peter Anvin
2014-04-23 17:08 ` Andrew Lutomirski
2014-04-23 17:16 ` H. Peter Anvin
2014-04-23 17:25 ` Andrew Lutomirski
2014-04-23 17:28 ` H. Peter Anvin
2014-04-23 17:45 ` Andrew Lutomirski
2014-04-22 17:19 ` Linus Torvalds
2014-04-22 17:29 ` H. Peter Anvin
2014-04-22 17:46 ` Andrew Lutomirski
2014-04-22 17:59 ` H. Peter Anvin
2014-04-22 18:03 ` Brian Gerst
2014-04-22 18:06 ` H. Peter Anvin
2014-04-22 18:17 ` Brian Gerst
2014-04-22 18:51 ` H. Peter Anvin
2014-04-22 19:55 ` Brian Gerst
2014-04-22 20:17 ` H. Peter Anvin
2014-04-22 23:08 ` Brian Gerst
2014-04-22 23:39 ` Andi Kleen
2014-04-22 23:40 ` H. Peter Anvin
2014-04-22 17:11 ` H. Peter Anvin
2014-04-22 17:26 ` Borislav Petkov
2014-04-22 17:29 ` Andrew Lutomirski
2014-04-22 19:27 ` Borislav Petkov
2014-04-23 6:24 ` H. Peter Anvin
2014-04-23 8:57 ` Alexandre Julliard
2014-04-22 17:09 ` H. Peter Anvin
2014-04-22 17:20 ` Andrew Lutomirski
2014-04-22 17:24 ` H. Peter Anvin
2014-04-22 11:25 ` Borislav Petkov
2014-04-23 1:17 ` H. Peter Anvin
2014-04-23 1:23 ` Andrew Lutomirski
2014-04-23 1:42 ` H. Peter Anvin
2014-04-23 14:24 ` Boris Ostrovsky
2014-04-23 16:56 ` H. Peter Anvin
2014-04-28 13:04 ` Konrad Rzeszutek Wilk
2014-04-25 21:02 ` Konrad Rzeszutek Wilk [this message]
2014-04-25 21:16 ` H. Peter Anvin
2014-04-24 4:13 ` comex
2014-04-24 4:53 ` Andrew Lutomirski
2014-04-24 22:24 ` H. Peter Anvin
2014-04-24 22:31 ` Andrew Lutomirski
2014-04-24 22:37 ` H. Peter Anvin
2014-04-24 22:43 ` Andrew Lutomirski
2014-04-28 23:05 ` H. Peter Anvin
2014-04-28 23:08 ` H. Peter Anvin
2014-04-29 0:02 ` Andrew Lutomirski
2014-04-29 0:15 ` H. Peter Anvin
2014-04-29 0:20 ` Andrew Lutomirski
2014-04-29 2:38 ` H. Peter Anvin
2014-04-29 2:44 ` H. Peter Anvin
2014-04-29 3:45 ` H. Peter Anvin
2014-04-29 3:47 ` H. Peter Anvin
2014-04-29 4:36 ` H. Peter Anvin
2014-04-29 7:14 ` H. Peter Anvin
2014-04-25 12:02 ` Pavel Machek
2014-04-25 21:20 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140425210241.GB30532@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=amluto@gmail.com \
--cc=andi@firstfloor.org \
--cc=arjan.van.de.ven@intel.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=heukelum@fastmail.fm \
--cc=hpa@linux.intel.com \
--cc=hpa@zytor.com \
--cc=julliard@winehq.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.