* [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
@ 2011-08-05 13:15 Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton
Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Hi,
a small refinement of the patchset from yesterday per hpa's comments:
* put mask and flags into a single cacheline and make it __read_mostly
* change alignment computation back to clearing bits [14:12] so that a
mask of 0x0 can have no effect on the address.
Please take a look and apply, if no objections.
Thanks.
---
Changelog:
v3:
here's an updated and revised patchset addressing all comments from last
time:
* saturate bits [14:12] instead of clearing them
* calculate the mask from the CPUID 0x8000_0005 IC identifier instead of
hardcoding it
v2:
here's the second version of this patch which actually turned into a
small patchset. As Ingo suggested, the initial patch stays first to ease
backporting and the following 3 patches address (hopefully) all review
comments from the initial submission. The patchset has been tested with
Debian's old stable lenny (i.e. 5.0) distro in a 32-bit environment and
all worked as expected.
Below some performance data to show that there is no noticeable
performance degradation introduced by the changeset.
So please, do take a look again and let me know.
Thanks.
VA alignment enabled.
====================
Performance counter stats for './build.sh' (10 runs):
3187047.935990 task-clock # 24.001 CPUs utilized ( +- 1.37% )
510,888 context-switches # 0.000 M/sec ( +- 0.44% )
60,712 CPU-migrations # 0.000 M/sec ( +- 0.51% )
26,046,891 page-faults # 0.008 M/sec ( +- 0.00% )
1,841,068,123,735 cycles # 0.578 GHz ( +- 1.10% ) [63.39%]
560,044,437,348 stalled-cycles-frontend # 30.42% frontend cycles idle ( +- 1.13% ) [64.65%]
436,165,228,465 stalled-cycles-backend # 23.69% backend cycles idle ( +- 1.19% ) [67.21%]
1,461,854,088,667 instructions # 0.79 insns per cycle
# 0.38 stalled cycles per insn ( +- 0.77% ) [70.31%]
334,169,452,362 branches # 104.852 M/sec ( +- 1.20% ) [69.43%]
21,485,007,982 branch-misses # 6.43% of all branches ( +- 0.68% ) [65.01%]
132.787483539 seconds time elapsed ( +- 1.37% )
VA alignment disabled
=====================
Performance counter stats for './build.sh' (10 runs):
3173688.887193 task-clock # 24.001 CPUs utilized ( +- 1.37% )
511,425 context-switches # 0.000 M/sec ( +- 0.28% )
60,522 CPU-migrations # 0.000 M/sec ( +- 0.60% )
26,046,902 page-faults # 0.008 M/sec ( +- 0.00% )
1,832,825,813,094 cycles # 0.578 GHz ( +- 0.96% ) [63.60%]
563,123,451,900 stalled-cycles-frontend # 30.72% frontend cycles idle ( +- 0.96% ) [63.97%]
439,565,070,106 stalled-cycles-backend # 23.98% backend cycles idle ( +- 1.23% ) [66.69%]
1,465,314,643,020 instructions # 0.80 insns per cycle
# 0.38 stalled cycles per insn ( +- 0.74% ) [70.11%]
332,416,669,982 branches # 104.741 M/sec ( +- 0.85% ) [69.71%]
21,181,821,204 branch-misses # 6.37% of all branches ( +- 0.97% ) [65.93%]
132.230903628 seconds time elapsed ( +- 1.37% )
stock 3.0
=========
Performance counter stats for './build.sh' (10 runs):
3369707.240439 task-clock # 24.001 CPUs utilized ( +- 1.18% )
510,450 context-switches # 0.000 M/sec ( +- 0.29% )
58,906 CPU-migrations # 0.000 M/sec ( +- 0.35% )
26,057,272 page-faults # 0.008 M/sec ( +- 0.00% )
1,836,326,075,063 cycles # 0.545 GHz ( +- 1.05% ) [63.51%]
561,850,647,545 stalled-cycles-frontend # 30.60% frontend cycles idle ( +- 1.03% ) [64.17%]
439,923,021,200 stalled-cycles-backend # 23.96% backend cycles idle ( +- 1.10% ) [66.64%]
1,467,236,934,265 instructions # 0.80 insns per cycle
# 0.38 stalled cycles per insn ( +- 0.87% ) [70.06%]
331,937,054,120 branches # 98.506 M/sec ( +- 0.81% ) [69.83%]
21,228,553,080 branch-misses # 6.40% of all branches ( +- 0.87% ) [65.79%]
140.398317711 seconds time elapsed ( +- 1.18% )
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH -v3.1 1/3] x86, AMD: Correct F15h IC aliasing issue
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
2011-08-05 22:58 ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton
Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.
This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.
This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.
It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.
This change leaves virtual region address allocation on other families
and/or vendors unaffected.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
Documentation/kernel-parameters.txt | 13 ++++++
arch/x86/include/asm/elf.h | 31 +++++++++++++
arch/x86/kernel/cpu/amd.c | 13 ++++++
arch/x86/kernel/sys_x86_64.c | 81 +++++++++++++++++++++++++++++++++-
arch/x86/mm/mmap.c | 15 ------
arch/x86/vdso/vma.c | 9 ++++
6 files changed, 144 insertions(+), 18 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aa47be7..af73c03 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -299,6 +299,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
behaviour to be specified. Bit 0 enables warnings,
bit 1 enables fixups, and bit 2 sends a segfault.
+ align_va_addr= [X86-64]
+ Align virtual addresses by clearing slice [14:12] when
+ allocating a VMA at process creation time. This option
+ gives you up to 3% performance improvement on AMD F15h
+ machines (where it is enabled by default) for a
+ CPU-intensive style benchmark, and it can vary highly in
+ a microbenchmark depending on workload and compiler.
+
+ 1: only for 32-bit processes
+ 2: only for 64-bit processes
+ on: enable for both 32- and 64-bit processes
+ off: disable for both 32- and 64-bit processes
+
amd_iommu= [HW,X86-84]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
/*
* ELF register definitions..
*/
+#include <linux/thread_info.h>
#include <asm/ptrace.h>
#include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
extern unsigned long arch_randomize_brk(struct mm_struct *mm);
#define arch_randomize_brk arch_randomize_brk
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+ return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+ if (test_thread_flag(TIF_IA32))
+ return 1;
+#endif
+ return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+ ALIGN_VA_32 = BIT(0),
+ ALIGN_VA_64 = BIT(1),
+ ALIGN_VDSO = BIT(2),
+ ALIGN_TOPDOWN = BIT(3),
+};
+
+struct va_alignment {
+ int flags;
+ unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
#endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
"with P0 frequency!\n");
}
}
+
+ if (c->x86 == 0x15) {
+ unsigned long upperbit;
+ u32 cpuid, assoc;
+
+ cpuid = cpuid_edx(0x80000005);
+ assoc = cpuid >> 16 & 0xff;
+ upperbit = ((cpuid >> 24) << 10) / assoc;
+
+ va_align.mask = (upperbit - 1) & PAGE_MASK;
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+
+ }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
+struct __read_mostly va_alignment va_align = {
+ .flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+ enum align_flags flags)
+{
+ unsigned long tmp_addr;
+
+ /* handle 32- and 64-bit case with a single conditional */
+ if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+ return addr;
+
+ if (!(current->flags & PF_RANDOMIZE))
+ return addr;
+
+ if (!((flags & ALIGN_VDSO) || filp))
+ return addr;
+
+ tmp_addr = addr;
+
+ /*
+ * We need an address which is <= than the original
+ * one only when in topdown direction.
+ */
+ if (!(flags & ALIGN_TOPDOWN))
+ tmp_addr += va_align.mask;
+
+ tmp_addr &= ~va_align.mask;
+
+ return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+ /* guard against enabling this on other CPU families */
+ if (va_align.flags < 0)
+ return 1;
+
+ if (*str == 0)
+ return 1;
+
+ if (*str == '=')
+ str++;
+
+ if (!strcmp(str, "32"))
+ va_align.flags = ALIGN_VA_32;
+ else if (!strcmp(str, "64"))
+ va_align.flags = ALIGN_VA_64;
+ else if (!strcmp(str, "off"))
+ va_align.flags = 0;
+ else if (!strcmp(str, "on"))
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+ else
+ return 0;
+
+ return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
start_addr = addr;
full_search:
+
+ addr = align_addr(addr, filp, 0);
+
for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
/* At this point: (!vma || addr < vma->vm_end). */
if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
mm->cached_hole_size = vma->vm_start - addr;
addr = vma->vm_end;
+ addr = align_addr(addr, filp, 0);
}
}
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
/* make sure it can fit in the remaining address space */
if (addr > len) {
- vma = find_vma(mm, addr-len);
- if (!vma || addr <= vma->vm_start)
+ unsigned long tmp_addr = align_addr(addr - len, filp,
+ ALIGN_TOPDOWN);
+
+ vma = find_vma(mm, tmp_addr);
+ if (!vma || tmp_addr + len <= vma->vm_start)
/* remember the address as a hint for next time */
- return mm->free_area_cache = addr-len;
+ return mm->free_area_cache = tmp_addr;
}
if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
addr = mm->mmap_base-len;
do {
+ addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
/*
* Lookup failure means no vma is above this address,
* else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
#define MAX_GAP (TASK_SIZE/6*5)
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
- return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
- if (test_thread_flag(TIF_IA32))
- return 1;
-#endif
- return 0;
-}
-
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 7abd2be..caa42ce 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -69,6 +69,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
addr = start + (offset << PAGE_SHIFT);
if (addr >= end)
addr = end;
+
+ /*
+ * page-align it here so that get_unmapped_area doesn't
+ * align it wrongfully again to the next page. addr can come in 4K
+ * unaligned here as a result of stack start randomization.
+ */
+ addr = PAGE_ALIGN(addr);
+ addr = align_addr(addr, NULL, ALIGN_VDSO);
+
return addr;
}
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
3 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton
Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Add a function ptr to struct x86_cpuinit_ops which is destined to be run
only once on the BSP during boot.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/x86_init.c | 1 +
3 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index d3d8590..08994a0 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -147,6 +147,7 @@ struct x86_init_ops {
*/
struct x86_cpuinit_ops {
void (*setup_percpu_clockev)(void);
+ void (*run_on_bsp)(void);
};
/**
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..465e633 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
filter_cpuid_features(c, false);
setup_smep(c);
+
+ x86_cpuinit.run_on_bsp();
}
void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f164bd..76b37ed 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -90,6 +90,7 @@ struct x86_init_ops x86_init __initdata = {
struct x86_cpuinit_ops x86_cpuinit __cpuinitdata = {
.setup_percpu_clockev = setup_secondary_APIC_clock,
+ .run_on_bsp = x86_init_noop,
};
static void default_nmi_init(void) { };
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH -v3.1 3/3] x86, AMD: Move BSP code to cpuinit helper
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
@ 2011-08-05 13:15 ` Borislav Petkov
2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
3 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 13:15 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton
Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Move code which is run once on the BSP during boot into the cpuinit
helper.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/amd.c | 60 +++++++++++++++++++++++---------------------
1 files changed, 31 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..16939b8 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,36 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}
+static void __cpuinit amd_run_on_bsp(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
+
+ if (c->x86 == 0x15) {
+ unsigned long upperbit;
+ u32 cpuid, assoc;
+
+ cpuid = cpuid_edx(0x80000005);
+ assoc = cpuid >> 16 & 0xff;
+ upperbit = ((cpuid >> 24) << 10) / assoc;
+
+ va_align.mask = (upperbit - 1) & PAGE_MASK;
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+ }
+}
+
static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
{
early_init_amd_mc(c);
@@ -442,35 +472,7 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
}
#endif
- /* We need to do the following only once */
- if (c != &boot_cpu_data)
- return;
-
- if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
- if (c->x86 > 0x10 ||
- (c->x86 == 0x10 && c->x86_model >= 0x2)) {
- u64 val;
-
- rdmsrl(MSR_K7_HWCR, val);
- if (!(val & BIT(24)))
- printk(KERN_WARNING FW_BUG "TSC doesn't count "
- "with P0 frequency!\n");
- }
- }
-
- if (c->x86 == 0x15) {
- unsigned long upperbit;
- u32 cpuid, assoc;
-
- cpuid = cpuid_edx(0x80000005);
- assoc = cpuid >> 16 & 0xff;
- upperbit = ((cpuid >> 24) << 10) / assoc;
-
- va_align.mask = (upperbit - 1) & PAGE_MASK;
- va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
-
- }
+ x86_cpuinit.run_on_bsp = amd_run_on_bsp;
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
` (2 preceding siblings ...)
2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
@ 2011-08-05 17:10 ` H. Peter Anvin
2011-08-05 17:55 ` Borislav Petkov
3 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 17:10 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
On 08/05/2011 06:15 AM, Borislav Petkov wrote:
> From: Borislav Petkov <borislav.petkov@amd.com>
>
> Hi,
>
> a small refinement of the patchset from yesterday per hpa's comments:
>
> * put mask and flags into a single cacheline and make it __read_mostly
>
> * change alignment computation back to clearing bits [14:12] so that a
> mask of 0x0 can have no effect on the address.
>
> Please take a look and apply, if no objections.
>
Patch 1 looks good now.
Patch 2 I'm going to object to because it puts your run_on_bsp method
into a different structure where all the existing methods for this
already are in a way that looks totally gratuitous to me. Why not just
have a c_bsp_init on struct cpu_dev like all the other methods?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue
2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
@ 2011-08-05 17:55 ` Borislav Petkov
2011-08-05 18:01 ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
2011-08-05 18:04 ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
0 siblings, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 17:55 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
On Fri, Aug 05, 2011 at 01:10:29PM -0400, H. Peter Anvin wrote:
> Patch 2 I'm going to object to because it puts your run_on_bsp method
> into a different structure where all the existing methods for this
> already are in a way that looks totally gratuitous to me. Why not just
> have a c_bsp_init on struct cpu_dev like all the other methods?
Indeed, cpu_dev is actually the only logical place to put them, thanks.
I'm sending updated versions as a reply to this message.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper
2011-08-05 17:55 ` Borislav Petkov
@ 2011-08-05 18:01 ` Borislav Petkov
2011-08-05 22:58 ` [tip:x86/cpu] " tip-bot for Borislav Petkov
2011-08-05 18:04 ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
1 sibling, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 18:01 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/common.c | 3 +++
arch/x86/kernel/cpu/cpu.h | 1 +
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..8ed394a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
filter_cpuid_features(c, false);
setup_smep(c);
+
+ if (this_cpu->c_bsp_init)
+ this_cpu->c_bsp_init(c);
}
void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e765633..1b22dcc 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -18,6 +18,7 @@ struct cpu_dev {
struct cpu_model_info c_models[4];
void (*c_early_init)(struct cpuinfo_x86 *);
+ void (*c_bsp_init)(struct cpuinfo_x86 *);
void (*c_init)(struct cpuinfo_x86 *);
void (*c_identify)(struct cpuinfo_x86 *);
unsigned int (*c_size_cache)(struct cpuinfo_x86 *, unsigned int);
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
2011-08-05 17:55 ` Borislav Petkov
2011-08-05 18:01 ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
@ 2011-08-05 18:04 ` Borislav Petkov
2011-08-05 20:07 ` H. Peter Anvin
2011-08-05 22:59 ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov
1 sibling, 2 replies; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 18:04 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
Move code which is run once on the BSP during boot into the cpu_dev
helper.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/amd.c | 59 ++++++++++++++++++++++-----------------------
1 files changed, 29 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..53d96f5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,34 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}
+static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
+{
+ if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
+
+ if (c->x86 == 0x15) {
+ unsigned long upperbit;
+ u32 cpuid, assoc;
+
+ cpuid = cpuid_edx(0x80000005);
+ assoc = cpuid >> 16 & 0xff;
+ upperbit = ((cpuid >> 24) << 10) / assoc;
+
+ va_align.mask = (upperbit - 1) & PAGE_MASK;
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+ }
+}
+
static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
{
early_init_amd_mc(c);
@@ -441,36 +469,6 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
-
- /* We need to do the following only once */
- if (c != &boot_cpu_data)
- return;
-
- if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
- if (c->x86 > 0x10 ||
- (c->x86 == 0x10 && c->x86_model >= 0x2)) {
- u64 val;
-
- rdmsrl(MSR_K7_HWCR, val);
- if (!(val & BIT(24)))
- printk(KERN_WARNING FW_BUG "TSC doesn't count "
- "with P0 frequency!\n");
- }
- }
-
- if (c->x86 == 0x15) {
- unsigned long upperbit;
- u32 cpuid, assoc;
-
- cpuid = cpuid_edx(0x80000005);
- assoc = cpuid >> 16 & 0xff;
- upperbit = ((cpuid >> 24) << 10) / assoc;
-
- va_align.mask = (upperbit - 1) & PAGE_MASK;
- va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
-
- }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
@@ -692,6 +690,7 @@ static const struct cpu_dev __cpuinitconst amd_cpu_dev = {
.c_size_cache = amd_size_cache,
#endif
.c_early_init = early_init_amd,
+ .c_bsp_init = bsp_init_amd,
.c_init = init_amd,
.c_x86_vendor = X86_VENDOR_AMD,
};
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
2011-08-05 18:04 ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
@ 2011-08-05 20:07 ` H. Peter Anvin
2011-08-05 22:52 ` Borislav Petkov
2011-08-05 22:59 ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov
1 sibling, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 20:07 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
On 08/05/2011 11:04 AM, Borislav Petkov wrote:
> Move code which is run once on the BSP during boot into the cpu_dev
> helper.
> +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
> +{
> + if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +
You can't use static_cpu_has() here, since this code runs before
alternatives -- it will always be false. Furthermore, for code that
only runs once, it is never a win to do patching.
Arguably bsp_init should be __init and not __cpuinit, but I don't know
how to make that work with the machinery, and is something that can be
fixed anyway.
-hpa
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
2011-08-05 20:07 ` H. Peter Anvin
@ 2011-08-05 22:52 ` Borislav Petkov
2011-08-05 22:56 ` H. Peter Anvin
0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-05 22:52 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton, Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
On Fri, Aug 05, 2011 at 04:07:40PM -0400, H. Peter Anvin wrote:
> On 08/05/2011 11:04 AM, Borislav Petkov wrote:
> > Move code which is run once on the BSP during boot into the cpu_dev
> > helper.
> > +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
> > +{
> > + if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> > +
>
> You can't use static_cpu_has() here, since this code runs before
> alternatives -- it will always be false. Furthermore, for code that
> only runs once, it is never a win to do patching.
Oh crap, this is a leftover from when run_on_bsp was struct
x86_cpuinit_ops member with no args. And I f*cked it up even then
although I went and got myself a pointer to boot_cpu_data:
+static void __cpuinit amd_run_on_bsp(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
but forgot to use it. Good catch, will fix it tomorrow.
> Arguably bsp_init should be __init and not __cpuinit, but I don't know
> how to make that work with the machinery, and is something that can be
> fixed anyway.
Yeah, how do we do that? struct cpu_dev is __cpuinitconst,
x86_cpuinit_ops is __cpuinitdata.
We could add it to identify_boot_cpu() - there's already some per-vendor
stuff like init_amd_e400_c1e_mask() which wouldn't hurt to be behind a
vendor check. early_identify_cpu() does already the vendor check with
get_cpu_vendor() so later, in identify_cpu() we could add a run_on_bsp()
which is __init and switch/case on the ->x86_vendor inside.
Then we can collect all the run-once-on-the-BSP code in there.
Hmmm..
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH -v3.2 3/3] x86, AMD: Move BSP code to cpu_dev helper
2011-08-05 22:52 ` Borislav Petkov
@ 2011-08-05 22:56 ` H. Peter Anvin
0 siblings, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-05 22:56 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Andrew Morton,
Avi Kivity, Przywara, Andre, Pohlack, Martin, LKML
On 08/05/2011 03:52 PM, Borislav Petkov wrote:
> On Fri, Aug 05, 2011 at 04:07:40PM -0400, H. Peter Anvin wrote:
>> On 08/05/2011 11:04 AM, Borislav Petkov wrote:
>>> Move code which is run once on the BSP during boot into the cpu_dev
>>> helper.
>>> +static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
>>> +{
>>> + if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
>>> +
>>
>> You can't use static_cpu_has() here, since this code runs before
>> alternatives -- it will always be false. Furthermore, for code that
>> only runs once, it is never a win to do patching.
>
> Oh crap, this is a leftover from when run_on_bsp was struct
> x86_cpuinit_ops member with no args. And I f*cked it up even then
> although I went and got myself a pointer to boot_cpu_data:
>
I fixed it up directly.
>
>> Arguably bsp_init should be __init and not __cpuinit, but I don't know
>> how to make that work with the machinery, and is something that can be
>> fixed anyway.
>
> Yeah, how do we do that? struct cpu_dev is __cpuinitconst,
> x86_cpuinit_ops is __cpuinitdata.
>
> We could add it to identify_boot_cpu() - there's already some per-vendor
> stuff like init_amd_e400_c1e_mask() which wouldn't hurt to be behind a
> vendor check. early_identify_cpu() does already the vendor check with
> get_cpu_vendor() so later, in identify_cpu() we could add a run_on_bsp()
> which is __init and switch/case on the ->x86_vendor inside.
>
> Then we can collect all the run-once-on-the-BSP code in there.
>
> Hmmm..
>
As I said, we can fix this up incrementally.
-hpa
^ permalink raw reply [flat|nested] 17+ messages in thread
* [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
@ 2011-08-05 22:58 ` tip-bot for Borislav Petkov
2011-08-06 0:10 ` H. Peter Anvin
0 siblings, 1 reply; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, hpa, borislav.petkov
Commit-ID: dfb09f9b7ab03fd367740e541a5caf830ed56726
Gitweb: http://git.kernel.org/tip/dfb09f9b7ab03fd367740e541a5caf830ed56726
Author: Borislav Petkov <borislav.petkov@amd.com>
AuthorDate: Fri, 5 Aug 2011 15:15:08 +0200
Committer: H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:26:44 -0700
x86, amd: Avoid cache aliasing penalties on AMD family 15h
This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.
This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.
This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.
It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.
This change leaves virtual region address allocation on other families
and/or vendors unaffected.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
Documentation/kernel-parameters.txt | 13 ++++++
arch/x86/include/asm/elf.h | 31 +++++++++++++
arch/x86/kernel/cpu/amd.c | 13 ++++++
arch/x86/kernel/sys_x86_64.c | 81 +++++++++++++++++++++++++++++++++-
arch/x86/mm/mmap.c | 15 ------
arch/x86/vdso/vma.c | 9 ++++
6 files changed, 144 insertions(+), 18 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aa47be7..af73c03 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -299,6 +299,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
behaviour to be specified. Bit 0 enables warnings,
bit 1 enables fixups, and bit 2 sends a segfault.
+ align_va_addr= [X86-64]
+ Align virtual addresses by clearing slice [14:12] when
+ allocating a VMA at process creation time. This option
+ gives you up to 3% performance improvement on AMD F15h
+ machines (where it is enabled by default) for a
+ CPU-intensive style benchmark, and it can vary highly in
+ a microbenchmark depending on workload and compiler.
+
+ 1: only for 32-bit processes
+ 2: only for 64-bit processes
+ on: enable for both 32- and 64-bit processes
+ off: disable for both 32- and 64-bit processes
+
amd_iommu= [HW,X86-84]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
/*
* ELF register definitions..
*/
+#include <linux/thread_info.h>
#include <asm/ptrace.h>
#include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
extern unsigned long arch_randomize_brk(struct mm_struct *mm);
#define arch_randomize_brk arch_randomize_brk
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+ return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+ if (test_thread_flag(TIF_IA32))
+ return 1;
+#endif
+ return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+ ALIGN_VA_32 = BIT(0),
+ ALIGN_VA_64 = BIT(1),
+ ALIGN_VDSO = BIT(2),
+ ALIGN_TOPDOWN = BIT(3),
+};
+
+struct va_alignment {
+ int flags;
+ unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
#endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
"with P0 frequency!\n");
}
}
+
+ if (c->x86 == 0x15) {
+ unsigned long upperbit;
+ u32 cpuid, assoc;
+
+ cpuid = cpuid_edx(0x80000005);
+ assoc = cpuid >> 16 & 0xff;
+ upperbit = ((cpuid >> 24) << 10) / assoc;
+
+ va_align.mask = (upperbit - 1) & PAGE_MASK;
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+
+ }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
+struct __read_mostly va_alignment va_align = {
+ .flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+ enum align_flags flags)
+{
+ unsigned long tmp_addr;
+
+ /* handle 32- and 64-bit case with a single conditional */
+ if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+ return addr;
+
+ if (!(current->flags & PF_RANDOMIZE))
+ return addr;
+
+ if (!((flags & ALIGN_VDSO) || filp))
+ return addr;
+
+ tmp_addr = addr;
+
+ /*
+ * We need an address which is <= than the original
+ * one only when in topdown direction.
+ */
+ if (!(flags & ALIGN_TOPDOWN))
+ tmp_addr += va_align.mask;
+
+ tmp_addr &= ~va_align.mask;
+
+ return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+ /* guard against enabling this on other CPU families */
+ if (va_align.flags < 0)
+ return 1;
+
+ if (*str == 0)
+ return 1;
+
+ if (*str == '=')
+ str++;
+
+ if (!strcmp(str, "32"))
+ va_align.flags = ALIGN_VA_32;
+ else if (!strcmp(str, "64"))
+ va_align.flags = ALIGN_VA_64;
+ else if (!strcmp(str, "off"))
+ va_align.flags = 0;
+ else if (!strcmp(str, "on"))
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+ else
+ return 0;
+
+ return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
start_addr = addr;
full_search:
+
+ addr = align_addr(addr, filp, 0);
+
for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
/* At this point: (!vma || addr < vma->vm_end). */
if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
mm->cached_hole_size = vma->vm_start - addr;
addr = vma->vm_end;
+ addr = align_addr(addr, filp, 0);
}
}
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
/* make sure it can fit in the remaining address space */
if (addr > len) {
- vma = find_vma(mm, addr-len);
- if (!vma || addr <= vma->vm_start)
+ unsigned long tmp_addr = align_addr(addr - len, filp,
+ ALIGN_TOPDOWN);
+
+ vma = find_vma(mm, tmp_addr);
+ if (!vma || tmp_addr + len <= vma->vm_start)
/* remember the address as a hint for next time */
- return mm->free_area_cache = addr-len;
+ return mm->free_area_cache = tmp_addr;
}
if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
addr = mm->mmap_base-len;
do {
+ addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
/*
* Lookup failure means no vma is above this address,
* else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
#define MAX_GAP (TASK_SIZE/6*5)
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
- return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
- if (test_thread_flag(TIF_IA32))
- return 1;
-#endif
- return 0;
-}
-
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 7abd2be..caa42ce 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -69,6 +69,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
addr = start + (offset << PAGE_SHIFT);
if (addr >= end)
addr = end;
+
+ /*
+ * page-align it here so that get_unmapped_area doesn't
+ * align it wrongfully again to the next page. addr can come in 4K
+ * unaligned here as a result of stack start randomization.
+ */
+ addr = PAGE_ALIGN(addr);
+ addr = align_addr(addr, NULL, ALIGN_VDSO);
+
return addr;
}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:x86/cpu] x86: Add a BSP cpu_dev helper
2011-08-05 18:01 ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
@ 2011-08-05 22:58 ` tip-bot for Borislav Petkov
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, tglx, hpa, bp, borislav.petkov
Commit-ID: a110b5ec7371592eac856ac5c22dc7b518952d44
Gitweb: http://git.kernel.org/tip/a110b5ec7371592eac856ac5c22dc7b518952d44
Author: Borislav Petkov <bp@amd64.org>
AuthorDate: Fri, 5 Aug 2011 20:01:16 +0200
Committer: H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:26:49 -0700
x86: Add a BSP cpu_dev helper
Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180116.GB26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
arch/x86/kernel/cpu/common.c | 3 +++
arch/x86/kernel/cpu/cpu.h | 1 +
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..8ed394a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
filter_cpuid_features(c, false);
setup_smep(c);
+
+ if (this_cpu->c_bsp_init)
+ this_cpu->c_bsp_init(c);
}
void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e765633..1b22dcc 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -18,6 +18,7 @@ struct cpu_dev {
struct cpu_model_info c_models[4];
void (*c_early_init)(struct cpuinfo_x86 *);
+ void (*c_bsp_init)(struct cpuinfo_x86 *);
void (*c_init)(struct cpuinfo_x86 *);
void (*c_identify)(struct cpuinfo_x86 *);
unsigned int (*c_size_cache)(struct cpuinfo_x86 *, unsigned int);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:x86/cpu] x86, amd: Move BSP code to cpu_dev helper
2011-08-05 18:04 ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 20:07 ` H. Peter Anvin
@ 2011-08-05 22:59 ` tip-bot for Borislav Petkov
1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-05 22:59 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, tglx, hpa, bp, borislav.petkov
Commit-ID: 8fa8b035085e7320c15875c1f6b03b290ca2dd66
Gitweb: http://git.kernel.org/tip/8fa8b035085e7320c15875c1f6b03b290ca2dd66
Author: Borislav Petkov <bp@amd64.org>
AuthorDate: Fri, 5 Aug 2011 20:04:09 +0200
Committer: H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 5 Aug 2011 12:32:33 -0700
x86, amd: Move BSP code to cpu_dev helper
Move code which is run once on the BSP during boot into the cpu_dev
helper.
[ hpa: removed bogus cpu_has -> static_cpu_has conversion ]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180409.GC26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
arch/x86/kernel/cpu/amd.c | 59 ++++++++++++++++++++++-----------------------
1 files changed, 29 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..b6e3e87 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,34 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}
+static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
+
+ if (c->x86 == 0x15) {
+ unsigned long upperbit;
+ u32 cpuid, assoc;
+
+ cpuid = cpuid_edx(0x80000005);
+ assoc = cpuid >> 16 & 0xff;
+ upperbit = ((cpuid >> 24) << 10) / assoc;
+
+ va_align.mask = (upperbit - 1) & PAGE_MASK;
+ va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+ }
+}
+
static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
{
early_init_amd_mc(c);
@@ -441,36 +469,6 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
-
- /* We need to do the following only once */
- if (c != &boot_cpu_data)
- return;
-
- if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
- if (c->x86 > 0x10 ||
- (c->x86 == 0x10 && c->x86_model >= 0x2)) {
- u64 val;
-
- rdmsrl(MSR_K7_HWCR, val);
- if (!(val & BIT(24)))
- printk(KERN_WARNING FW_BUG "TSC doesn't count "
- "with P0 frequency!\n");
- }
- }
-
- if (c->x86 == 0x15) {
- unsigned long upperbit;
- u32 cpuid, assoc;
-
- cpuid = cpuid_edx(0x80000005);
- assoc = cpuid >> 16 & 0xff;
- upperbit = ((cpuid >> 24) << 10) / assoc;
-
- va_align.mask = (upperbit - 1) & PAGE_MASK;
- va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
-
- }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
@@ -692,6 +690,7 @@ static const struct cpu_dev __cpuinitconst amd_cpu_dev = {
.c_size_cache = amd_size_cache,
#endif
.c_early_init = early_init_amd,
+ .c_bsp_init = bsp_init_amd,
.c_init = init_amd,
.c_x86_vendor = X86_VENDOR_AMD,
};
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h
2011-08-05 22:58 ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
@ 2011-08-06 0:10 ` H. Peter Anvin
2011-08-06 12:31 ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
0 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2011-08-06 0:10 UTC (permalink / raw)
To: mingo, hpa, linux-kernel, tglx, hpa, borislav.petkov; +Cc: linux-tip-commits
On 08/05/2011 03:58 PM, tip-bot for Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index b13ed39..b0234bc 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> "with P0 frequency!\n");
> }
> }
> +
> + if (c->x86 == 0x15) {
> + unsigned long upperbit;
> + u32 cpuid, assoc;
> +
> + cpuid = cpuid_edx(0x80000005);
> + assoc = cpuid >> 16 & 0xff;
> + upperbit = ((cpuid >> 24) << 10) / assoc;
> +
> + va_align.mask = (upperbit - 1) & PAGE_MASK;
> + va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
> +
> + }
> }
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
Breaks all i386 builds:
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'
[the line numbers refer to the entire patchset]
-hpa
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch
2011-08-06 0:10 ` H. Peter Anvin
@ 2011-08-06 12:31 ` Borislav Petkov
2011-08-06 23:22 ` [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build tip-bot for Borislav Petkov
0 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2011-08-06 12:31 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Andrew Morton
Cc: Avi Kivity, Andre Przywara, Martin Pohlack, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit
builds with the following error message:
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'
This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/sys_x86_64.c | 4 ----
arch/x86/mm/mmap.c | 5 ++++-
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index aaa8d09..fe7d2da 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,10 +18,6 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
-struct __read_mostly va_alignment va_align = {
- .flags = -1,
-};
-
/*
* Align a virtual address to avoid aliasing in the I$ on AMD F15h.
*
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d4c0736..4b5ba85 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,10 @@
#include <linux/sched.h>
#include <asm/elf.h>
+struct __read_mostly va_alignment va_align = {
+ .flags = -1,
+};
+
static unsigned int stack_maxrandom_size(void)
{
unsigned int max = 0;
@@ -42,7 +46,6 @@ static unsigned int stack_maxrandom_size(void)
return max;
}
-
/*
* Top of mmap area (just below the process stack).
*
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build
2011-08-06 12:31 ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
@ 2011-08-06 23:22 ` tip-bot for Borislav Petkov
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Borislav Petkov @ 2011-08-06 23:22 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, borislav.petkov
Commit-ID: 9387f774d61b01ab71bade85e6d0bfab0b3419bd
Gitweb: http://git.kernel.org/tip/9387f774d61b01ab71bade85e6d0bfab0b3419bd
Author: Borislav Petkov <borislav.petkov@amd.com>
AuthorDate: Sat, 6 Aug 2011 14:31:38 +0200
Committer: H. Peter Anvin <hpa@zytor.com>
CommitDate: Sat, 6 Aug 2011 11:44:57 -0700
x86-32, amd: Move va_align definition to unbreak 32-bit build
hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit
builds with the following error message:
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'
This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/sys_x86_64.c | 4 ----
arch/x86/mm/mmap.c | 5 ++++-
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index aaa8d09..fe7d2da 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,10 +18,6 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
-struct __read_mostly va_alignment va_align = {
- .flags = -1,
-};
-
/*
* Align a virtual address to avoid aliasing in the I$ on AMD F15h.
*
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d4c0736..4b5ba85 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,10 @@
#include <linux/sched.h>
#include <asm/elf.h>
+struct __read_mostly va_alignment va_align = {
+ .flags = -1,
+};
+
static unsigned int stack_maxrandom_size(void)
{
unsigned int max = 0;
@@ -42,7 +46,6 @@ static unsigned int stack_maxrandom_size(void)
return max;
}
-
/*
* Top of mmap area (just below the process stack).
*
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-08-06 23:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-05 13:15 [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 1/3] " Borislav Petkov
2011-08-05 22:58 ` [tip:x86/cpu] x86, amd: Avoid cache aliasing penalties on AMD family 15h tip-bot for Borislav Petkov
2011-08-06 0:10 ` H. Peter Anvin
2011-08-06 12:31 ` [PATCH] x86, AMD: Fix 32-bit build after cache aliasing patch Borislav Petkov
2011-08-06 23:22 ` [tip:x86/cpu] x86-32, amd: Move va_align definition to unbreak 32-bit build tip-bot for Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 2/3] x86: Add a BSP cpuinit helper Borislav Petkov
2011-08-05 13:15 ` [PATCH -v3.1 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 17:10 ` [PATCH -v3.1 0/3] x86, AMD: Correct F15h IC aliasing issue H. Peter Anvin
2011-08-05 17:55 ` Borislav Petkov
2011-08-05 18:01 ` [PATCH -v3.2 2/3] x86: Add a BSP cpu_dev helper Borislav Petkov
2011-08-05 22:58 ` [tip:x86/cpu] " tip-bot for Borislav Petkov
2011-08-05 18:04 ` [PATCH -v3.2 3/3] x86, AMD: Move BSP code to " Borislav Petkov
2011-08-05 20:07 ` H. Peter Anvin
2011-08-05 22:52 ` Borislav Petkov
2011-08-05 22:56 ` H. Peter Anvin
2011-08-05 22:59 ` [tip:x86/cpu] x86, amd: " tip-bot for Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox