[GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes
@ 2011-11-04 11:22 Borislav Petkov
  2011-11-04 11:26 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:22 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, x86, LKML

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

Hi Greg,

as previously discussed, here are the 4 patches needed for addressing
the issue in $Subject. I'm sending them as mails but I have them also in
a git-pullable form in case you'd like to verify them:

git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git l1i-3.1.x

Here are the patches on the branch above along with SHA1s (from the
bottom up):

388e48924e2a891c42262ce2a8bc2479bd5984ec x86-32, amd: Move va_align definition to unbreak 32-bit build
31743a0af6a31c82897eaa55b81b41a8d5e48094 x86, amd: Move BSP code to cpu_dev helper
277d13f2fdd7af1cc785d627835c67af2865defa x86: Add a BSP cpu_dev helper
4143701b737dfb1443be126910330e570d4041a7 x86, amd: Avoid cache aliasing penalties on AMD family 15h

Please apply,
thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
@ 2011-11-04 11:26 ` Borislav Petkov
  2011-11-04 15:25   ` Greg KH
  2011-11-04 11:26 ` [PATCH 2/4] x86: Add a BSP cpu_dev helper Borislav Petkov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:26 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, X86-ML, LKML

From: Borislav Petkov <borislav.petkov@amd.com>

Upstream commit: dfb09f9b7ab03fd367740e541a5caf830ed56726

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.

It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 Documentation/kernel-parameters.txt |   13 ++++++
 arch/x86/include/asm/elf.h          |   31 +++++++++++++
 arch/x86/kernel/cpu/amd.c           |   13 ++++++
 arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
 arch/x86/mm/mmap.c                  |   15 ------
 arch/x86/vdso/vma.c                 |    9 ++++
 6 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d6e6724..66a1a10 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -306,6 +306,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			behaviour to be specified.  Bit 0 enables warnings,
 			bit 1 enables fixups, and bit 2 sends a segfault.
 
+	align_va_addr=	[X86-64]
+			Align virtual addresses by clearing slice [14:12] when
+			allocating a VMA at process creation time. This option
+			gives you up to 3% performance improvement on AMD F15h
+			machines (where it is enabled by default) for a
+			CPU-intensive style benchmark, and it can vary highly in
+			a microbenchmark depending on workload and compiler.
+
+			1: only for 32-bit processes
+			2: only for 64-bit processes
+			on: enable for both 32- and 64-bit processes
+			off: disable for both 32- and 64-bit processes
+
 	amd_iommu=	[HW,X86-84]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
 /*
  * ELF register definitions..
  */
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 #include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
 extern unsigned long arch_randomize_brk(struct mm_struct *mm);
 #define arch_randomize_brk arch_randomize_brk
 
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+	return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+	if (test_thread_flag(TIF_IA32))
+		return 1;
+#endif
+	return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+	ALIGN_VA_32	= BIT(0),
+	ALIGN_VA_64	= BIT(1),
+	ALIGN_VDSO	= BIT(2),
+	ALIGN_TOPDOWN	= BIT(3),
+};
+
+struct va_alignment {
+	int flags;
+	unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
 #endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 					"with P0 frequency!\n");
 		}
 	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+
+	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+			 enum align_flags flags)
+{
+	unsigned long tmp_addr;
+
+	/* handle 32- and 64-bit case with a single conditional */
+	if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+		return addr;
+
+	if (!(current->flags & PF_RANDOMIZE))
+		return addr;
+
+	if (!((flags & ALIGN_VDSO) || filp))
+		return addr;
+
+	tmp_addr = addr;
+
+	/*
+	 * We need an address which is <= than the original
+	 * one only when in topdown direction.
+	 */
+	if (!(flags & ALIGN_TOPDOWN))
+		tmp_addr += va_align.mask;
+
+	tmp_addr &= ~va_align.mask;
+
+	return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+	/* guard against enabling this on other CPU families */
+	if (va_align.flags < 0)
+		return 1;
+
+	if (*str == 0)
+		return 1;
+
+	if (*str == '=')
+		str++;
+
+	if (!strcmp(str, "32"))
+		va_align.flags = ALIGN_VA_32;
+	else if (!strcmp(str, "64"))
+		va_align.flags = ALIGN_VA_64;
+	else if (!strcmp(str, "off"))
+		va_align.flags = 0;
+	else if (!strcmp(str, "on"))
+		va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+	else
+		return 0;
+
+	return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		unsigned long, prot, unsigned long, flags,
 		unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	start_addr = addr;
 
 full_search:
+
+	addr = align_addr(addr, filp, 0);
+
 	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
 			mm->cached_hole_size = vma->vm_start - addr;
 
 		addr = vma->vm_end;
+		addr = align_addr(addr, filp, 0);
 	}
 }
 
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* make sure it can fit in the remaining address space */
 	if (addr > len) {
-		vma = find_vma(mm, addr-len);
-		if (!vma || addr <= vma->vm_start)
+		unsigned long tmp_addr = align_addr(addr - len, filp,
+						    ALIGN_TOPDOWN);
+
+		vma = find_vma(mm, tmp_addr);
+		if (!vma || tmp_addr + len <= vma->vm_start)
 			/* remember the address as a hint for next time */
-			return mm->free_area_cache = addr-len;
+			return mm->free_area_cache = tmp_addr;
 	}
 
 	if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	addr = mm->mmap_base-len;
 
 	do {
+		addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
 		/*
 		 * Lookup failure means no vma is above this address,
 		 * else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
 #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
 #define MAX_GAP (TASK_SIZE/6*5)
 
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
-	return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
-	if (test_thread_flag(TIF_IA32))
-		return 1;
-#endif
-	return 0;
-}
-
 static int mmap_is_legacy(void)
 {
 	if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 316fbca..153407c 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -89,6 +89,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 	addr = start + (offset << PAGE_SHIFT);
 	if (addr >= end)
 		addr = end;
+
+	/*
+	 * page-align it here so that get_unmapped_area doesn't
+	 * align it wrongfully again to the next page. addr can come in 4K
+	 * unaligned here as a result of stack start randomization.
+	 */
+	addr = PAGE_ALIGN(addr);
+	addr = align_addr(addr, NULL, ALIGN_VDSO);
+
 	return addr;
 }
 
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-04 11:26 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
@ 2011-11-04 15:25   ` Greg KH
  2011-11-16 23:21     ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2011-11-04 15:25 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-stable, X86-ML, LKML

On Fri, Nov 04, 2011 at 12:26:32PM +0100, Borislav Petkov wrote:
> From: Borislav Petkov <borislav.petkov@amd.com>
> 
> Upstream commit: dfb09f9b7ab03fd367740e541a5caf830ed56726
> 
> This patch provides performance tuning for the "Bulldozer" CPU. With its
> shared instruction cache there is a chance of generating an excessive
> number of cache cross-invalidates when running specific workloads on the
> cores of a compute module.
> 
> This excessive amount of cross-invalidations can be observed if cache
> lines backed by shared physical memory alias in bits [14:12] of their
> virtual addresses, as those bits are used for the index generation.
> 
> This patch addresses the issue by clearing all the bits in the [14:12]
> slice of the file mapping's virtual address at generation time, thus
> forcing those bits the same for all mappings of a single shared library
> across processes and, in doing so, avoids instruction cache aliases.
> 
> It also adds the command line option "align_va_addr=(32|64|on|off)" with
> which virtual address alignment can be enabled for 32-bit or 64-bit x86
> individually, or both, or be completely disabled.
> 
> This change leaves virtual region address allocation on other families
> and/or vendors unaffected.
> 
> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  Documentation/kernel-parameters.txt |   13 ++++++
>  arch/x86/include/asm/elf.h          |   31 +++++++++++++
>  arch/x86/kernel/cpu/amd.c           |   13 ++++++
>  arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
>  arch/x86/mm/mmap.c                  |   15 ------
>  arch/x86/vdso/vma.c                 |    9 ++++
>  6 files changed, 144 insertions(+), 18 deletions(-)

I really feel nervous adding this patch to the -stable tree(s).  It's
bigger than "just a bugfix" and it adds new functionality.

I'm aware that it is needed for your new hardware, which is great, but
it doesn't really follow the Documentation/stable_kernel_rules.txt
requirements, does it?

I need an ACK from the x86 maintainers before I'm going to be
comfortable adding this, and then the other, patches in this series.

Peter, Ingo, Thomas, your opinions?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-04 15:25   ` Greg KH
@ 2011-11-16 23:21     ` Greg KH
  2011-11-17 11:48       ` Borislav Petkov
  2011-11-18 17:54       ` H. Peter Anvin
  0 siblings, 2 replies; 13+ messages in thread
From: Greg KH @ 2011-11-16 23:21 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-stable, X86-ML, LKML

On Fri, Nov 04, 2011 at 08:25:32AM -0700, Greg KH wrote:
> On Fri, Nov 04, 2011 at 12:26:32PM +0100, Borislav Petkov wrote:
> > From: Borislav Petkov <borislav.petkov@amd.com>
> > 
> > Upstream commit: dfb09f9b7ab03fd367740e541a5caf830ed56726
> > 
> > This patch provides performance tuning for the "Bulldozer" CPU. With its
> > shared instruction cache there is a chance of generating an excessive
> > number of cache cross-invalidates when running specific workloads on the
> > cores of a compute module.
> > 
> > This excessive amount of cross-invalidations can be observed if cache
> > lines backed by shared physical memory alias in bits [14:12] of their
> > virtual addresses, as those bits are used for the index generation.
> > 
> > This patch addresses the issue by clearing all the bits in the [14:12]
> > slice of the file mapping's virtual address at generation time, thus
> > forcing those bits the same for all mappings of a single shared library
> > across processes and, in doing so, avoids instruction cache aliases.
> > 
> > It also adds the command line option "align_va_addr=(32|64|on|off)" with
> > which virtual address alignment can be enabled for 32-bit or 64-bit x86
> > individually, or both, or be completely disabled.
> > 
> > This change leaves virtual region address allocation on other families
> > and/or vendors unaffected.
> > 
> > Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> > Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
> > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> > ---
> >  Documentation/kernel-parameters.txt |   13 ++++++
> >  arch/x86/include/asm/elf.h          |   31 +++++++++++++
> >  arch/x86/kernel/cpu/amd.c           |   13 ++++++
> >  arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
> >  arch/x86/mm/mmap.c                  |   15 ------
> >  arch/x86/vdso/vma.c                 |    9 ++++
> >  6 files changed, 144 insertions(+), 18 deletions(-)
> 
> I really feel nervous adding this patch to the -stable tree(s).  It's
> bigger than "just a bugfix" and it adds new functionality.
> 
> I'm aware that it is needed for your new hardware, which is great, but
> it doesn't really follow the Documentation/stable_kernel_rules.txt
> requirements, does it?
> 
> I need an ACK from the x86 maintainers before I'm going to be
> comfortable adding this, and then the other, patches in this series.
> 
> Peter, Ingo, Thomas, your opinions?

Ping?

anyone?

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-16 23:21     ` Greg KH
@ 2011-11-17 11:48       ` Borislav Petkov
  2011-11-18 17:54       ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-17 11:48 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner
  Cc: Greg KH, linux-stable, X86-ML, LKML

Ok, let's ping them directly.

x86 guys, do you have an issue with me backporting the aliasing fix to
3.x stable?

I know it doesn't adhere completely to -stable rules for it not being a
regression. Well, think of it as a hardware regression and me trying to
cover all bases :-).

Thanks.

On Wed, Nov 16, 2011 at 03:21:36PM -0800, Greg KH wrote:
> On Fri, Nov 04, 2011 at 08:25:32AM -0700, Greg KH wrote:
> > On Fri, Nov 04, 2011 at 12:26:32PM +0100, Borislav Petkov wrote:
> > > From: Borislav Petkov <borislav.petkov@amd.com>
> > > 
> > > Upstream commit: dfb09f9b7ab03fd367740e541a5caf830ed56726
> > > 
> > > This patch provides performance tuning for the "Bulldozer" CPU. With its
> > > shared instruction cache there is a chance of generating an excessive
> > > number of cache cross-invalidates when running specific workloads on the
> > > cores of a compute module.
> > > 
> > > This excessive amount of cross-invalidations can be observed if cache
> > > lines backed by shared physical memory alias in bits [14:12] of their
> > > virtual addresses, as those bits are used for the index generation.
> > > 
> > > This patch addresses the issue by clearing all the bits in the [14:12]
> > > slice of the file mapping's virtual address at generation time, thus
> > > forcing those bits the same for all mappings of a single shared library
> > > across processes and, in doing so, avoids instruction cache aliases.
> > > 
> > > It also adds the command line option "align_va_addr=(32|64|on|off)" with
> > > which virtual address alignment can be enabled for 32-bit or 64-bit x86
> > > individually, or both, or be completely disabled.
> > > 
> > > This change leaves virtual region address allocation on other families
> > > and/or vendors unaffected.
> > > 
> > > Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> > > Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
> > > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> > > ---
> > >  Documentation/kernel-parameters.txt |   13 ++++++
> > >  arch/x86/include/asm/elf.h          |   31 +++++++++++++
> > >  arch/x86/kernel/cpu/amd.c           |   13 ++++++
> > >  arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
> > >  arch/x86/mm/mmap.c                  |   15 ------
> > >  arch/x86/vdso/vma.c                 |    9 ++++
> > >  6 files changed, 144 insertions(+), 18 deletions(-)
> > 
> > I really feel nervous adding this patch to the -stable tree(s).  It's
> > bigger than "just a bugfix" and it adds new functionality.
> > 
> > I'm aware that it is needed for your new hardware, which is great, but
> > it doesn't really follow the Documentation/stable_kernel_rules.txt
> > requirements, does it?
> > 
> > I need an ACK from the x86 maintainers before I'm going to be
> > comfortable adding this, and then the other, patches in this series.
> > 
> > Peter, Ingo, Thomas, your opinions?
> 
> Ping?
> 
> anyone?
> 
> greg k-h
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-16 23:21     ` Greg KH
  2011-11-17 11:48       ` Borislav Petkov
@ 2011-11-18 17:54       ` H. Peter Anvin
  2011-11-28  0:01         ` Greg KH
  1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2011-11-18 17:54 UTC (permalink / raw)
  To: Greg KH; +Cc: Borislav Petkov, linux-stable, X86-ML, LKML

On 11/16/2011 03:21 PM, Greg KH wrote:
> 
> Ping?
> 
> anyone?
> 
> greg k-h

Looking at it now...

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-18 17:54       ` H. Peter Anvin
@ 2011-11-28  0:01         ` Greg KH
  2011-12-02 23:45           ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2011-11-28  0:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Borislav Petkov, linux-stable, X86-ML, LKML

On Fri, Nov 18, 2011 at 09:54:33AM -0800, H. Peter Anvin wrote:
> On 11/16/2011 03:21 PM, Greg KH wrote:
> > 
> > Ping?
> > 
> > anyone?
> > 
> > greg k-h
> 
> Looking at it now...

Any thoughts?

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-28  0:01         ` Greg KH
@ 2011-12-02 23:45           ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2011-12-02 23:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Borislav Petkov, linux-stable, X86-ML, LKML

On Mon, Nov 28, 2011 at 09:01:03AM +0900, Greg KH wrote:
> On Fri, Nov 18, 2011 at 09:54:33AM -0800, H. Peter Anvin wrote:
> > On 11/16/2011 03:21 PM, Greg KH wrote:
> > > 
> > > Ping?
> > > 
> > > anyone?
> > > 
> > > greg k-h
> > 
> > Looking at it now...
> 
> Any thoughts?

Ok, given the long time here, and no response, I'm dropping them from my
queue.

Borislav, if you get the x86 maintainers ack, please feel free to resend
these.

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] x86: Add a BSP cpu_dev helper
  2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
  2011-11-04 11:26 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
@ 2011-11-04 11:26 ` Borislav Petkov
  2011-11-04 11:26 ` [PATCH 3/4] x86, amd: Move BSP code to " Borislav Petkov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:26 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, X86-ML, LKML

Upstream commit: a110b5ec7371592eac856ac5c22dc7b518952d44

Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180116.GB26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c |    3 +++
 arch/x86/kernel/cpu/cpu.h    |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6218439..ec63df5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,6 +681,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	filter_cpuid_features(c, false);
 
 	setup_smep(c);
+
+	if (this_cpu->c_bsp_init)
+		this_cpu->c_bsp_init(c);
 }
 
 void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e765633..1b22dcc 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -18,6 +18,7 @@ struct cpu_dev {
 	struct		cpu_model_info c_models[4];
 
 	void            (*c_early_init)(struct cpuinfo_x86 *);
+	void		(*c_bsp_init)(struct cpuinfo_x86 *);
 	void		(*c_init)(struct cpuinfo_x86 *);
 	void		(*c_identify)(struct cpuinfo_x86 *);
 	unsigned int	(*c_size_cache)(struct cpuinfo_x86 *, unsigned int);
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] x86, amd: Move BSP code to cpu_dev helper
  2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
  2011-11-04 11:26 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
  2011-11-04 11:26 ` [PATCH 2/4] x86: Add a BSP cpu_dev helper Borislav Petkov
@ 2011-11-04 11:26 ` Borislav Petkov
  2011-11-04 11:26 ` [PATCH 4/4] x86-32, amd: Move va_align definition to unbreak 32-bit build Borislav Petkov
  2011-11-04 15:23 ` [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Greg KH
  4 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:26 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, X86-ML, LKML

Upstream commit: 8fa8b035085e7320c15875c1f6b03b290ca2dd66

Move code which is run once on the BSP during boot into the cpu_dev
helper.

[ hpa: removed bogus cpu_has -> static_cpu_has conversion ]

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180409.GC26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/cpu/amd.c |   59 ++++++++++++++++++++++-----------------------
 1 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b0234bc..b6e3e87 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -410,6 +410,34 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
 #endif
 }
 
+static void __cpuinit bsp_init_amd(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+		if (c->x86 > 0x10 ||
+		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+			u64 val;
+
+			rdmsrl(MSR_K7_HWCR, val);
+			if (!(val & BIT(24)))
+				printk(KERN_WARNING FW_BUG "TSC doesn't count "
+					"with P0 frequency!\n");
+		}
+	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+	}
+}
+
 static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 {
 	early_init_amd_mc(c);
@@ -441,36 +469,6 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
 	}
 #endif
-
-	/* We need to do the following only once */
-	if (c != &boot_cpu_data)
-		return;
-
-	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
-
-		if (c->x86 > 0x10 ||
-		    (c->x86 == 0x10 && c->x86_model >= 0x2)) {
-			u64 val;
-
-			rdmsrl(MSR_K7_HWCR, val);
-			if (!(val & BIT(24)))
-				printk(KERN_WARNING FW_BUG "TSC doesn't count "
-					"with P0 frequency!\n");
-		}
-	}
-
-	if (c->x86 == 0x15) {
-		unsigned long upperbit;
-		u32 cpuid, assoc;
-
-		cpuid	 = cpuid_edx(0x80000005);
-		assoc	 = cpuid >> 16 & 0xff;
-		upperbit = ((cpuid >> 24) << 10) / assoc;
-
-		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
-		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
-
-	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
@@ -692,6 +690,7 @@ static const struct cpu_dev __cpuinitconst amd_cpu_dev = {
 	.c_size_cache	= amd_size_cache,
 #endif
 	.c_early_init   = early_init_amd,
+	.c_bsp_init	= bsp_init_amd,
 	.c_init		= init_amd,
 	.c_x86_vendor	= X86_VENDOR_AMD,
 };
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] x86-32, amd: Move va_align definition to unbreak 32-bit build
  2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
                   ` (2 preceding siblings ...)
  2011-11-04 11:26 ` [PATCH 3/4] x86, amd: Move BSP code to " Borislav Petkov
@ 2011-11-04 11:26 ` Borislav Petkov
  2011-11-04 15:23 ` [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Greg KH
  4 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:26 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, X86-ML, LKML

From: Borislav Petkov <borislav.petkov@amd.com>

Upstream commit: 9387f774d61b01ab71bade85e6d0bfab0b3419bd

hpa reported that dfb09f9b7ab03fd367740e541a5caf830ed56726 breaks 32-bit
builds with the following error message:

/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'

This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/sys_x86_64.c |    4 ----
 arch/x86/mm/mmap.c           |    5 ++++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index aaa8d09..fe7d2da 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,10 +18,6 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
-struct __read_mostly va_alignment va_align = {
-	.flags = -1,
-};
-
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
  *
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index d4c0736..4b5ba85 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,10 @@
 #include <linux/sched.h>
 #include <asm/elf.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
 static unsigned int stack_maxrandom_size(void)
 {
 	unsigned int max = 0;
@@ -42,7 +46,6 @@ static unsigned int stack_maxrandom_size(void)
 	return max;
 }
 
-
 /*
  * Top of mmap area (just below the process stack).
  *
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes
  2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
                   ` (3 preceding siblings ...)
  2011-11-04 11:26 ` [PATCH 4/4] x86-32, amd: Move va_align definition to unbreak 32-bit build Borislav Petkov
@ 2011-11-04 15:23 ` Greg KH
  4 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2011-11-04 15:23 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-stable, x86, LKML

On Fri, Nov 04, 2011 at 12:22:10PM +0100, Borislav Petkov wrote:
> Hi Greg,
> 
> as previously discussed, here are the 4 patches needed for addressing
> the issue in $Subject. I'm sending them as mails but I have them also in
> a git-pullable form in case you'd like to verify them:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git l1i-3.1.x

I can't take a pull request, your emails are fine.

I'll comment on them individually, in the patches.

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [GIT PULL 3.0.x-stable] AMD F15h cache aliasing fixes
@ 2011-11-04 11:11 Borislav Petkov
  2011-11-04 11:15 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:11 UTC (permalink / raw)
  To: stable; +Cc: Greg Kroah-Hartman, x86, LKML

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

Hi Greg,

as previously discussed, here are the 4 patches needed for addressing
the issue in $Subject. I'm sending them as mails but I have them also in
a git-pullable form in case you'd like to verify them:

git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git l1i-3.0.x

Here are the patches on the branch above along with SHA1s (from the
bottom up):

bf27c8742db1e2c01bff6e8e2cce3b14d5d6cca5 x86-32, amd: Move va_align definition to unbreak 32-bit build
57d8c29f231049d7b10a497c8afa42dba4672a70 x86, amd: Move BSP code to cpu_dev helper
28b1d99bcde97b468585bdf6c94fa51931d2a1aa x86: Add a BSP cpu_dev helper
5c303905f98ab4e378488c919cc60483d59002ff x86, amd: Avoid cache aliasing penalties on AMD family 15h

Please apply,
thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h
  2011-11-04 11:11 [GIT PULL 3.0.x-stable] " Borislav Petkov
@ 2011-11-04 11:15 ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2011-11-04 11:15 UTC (permalink / raw)
  To: linux-stable; +Cc: Greg Kroah-Hartman, X86-ML, LKML

From: Borislav Petkov <borislav.petkov@amd.com>

Upstream commit: dfb09f9b7ab03fd367740e541a5caf830ed56726

This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.

This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.

This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.

It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.

This change leaves virtual region address allocation on other families
and/or vendors unaffected.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 Documentation/kernel-parameters.txt |   13 ++++++
 arch/x86/include/asm/elf.h          |   31 +++++++++++++
 arch/x86/kernel/cpu/amd.c           |   13 ++++++
 arch/x86/kernel/sys_x86_64.c        |   81 +++++++++++++++++++++++++++++++++-
 arch/x86/mm/mmap.c                  |   15 ------
 arch/x86/vdso/vma.c                 |    9 ++++
 6 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aa47be7..af73c03 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -299,6 +299,19 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			behaviour to be specified.  Bit 0 enables warnings,
 			bit 1 enables fixups, and bit 2 sends a segfault.
 
+	align_va_addr=	[X86-64]
+			Align virtual addresses by clearing slice [14:12] when
+			allocating a VMA at process creation time. This option
+			gives you up to 3% performance improvement on AMD F15h
+			machines (where it is enabled by default) for a
+			CPU-intensive style benchmark, and it can vary highly in
+			a microbenchmark depending on workload and compiler.
+
+			1: only for 32-bit processes
+			2: only for 64-bit processes
+			on: enable for both 32- and 64-bit processes
+			off: disable for both 32- and 64-bit processes
+
 	amd_iommu=	[HW,X86-84]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f2ad216..5f962df 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -4,6 +4,7 @@
 /*
  * ELF register definitions..
  */
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 #include <asm/user.h>
@@ -320,4 +321,34 @@ extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
 extern unsigned long arch_randomize_brk(struct mm_struct *mm);
 #define arch_randomize_brk arch_randomize_brk
 
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+#ifdef CONFIG_X86_32
+	return 1;
+#endif
+#ifdef CONFIG_IA32_EMULATION
+	if (test_thread_flag(TIF_IA32))
+		return 1;
+#endif
+	return 0;
+}
+
+/* The first two values are special, do not change. See align_addr() */
+enum align_flags {
+	ALIGN_VA_32	= BIT(0),
+	ALIGN_VA_64	= BIT(1),
+	ALIGN_VDSO	= BIT(2),
+	ALIGN_TOPDOWN	= BIT(3),
+};
+
+struct va_alignment {
+	int flags;
+	unsigned long mask;
+} ____cacheline_aligned;
+
+extern struct va_alignment va_align;
+extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
 #endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b13ed39..b0234bc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -458,6 +458,19 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 					"with P0 frequency!\n");
 		}
 	}
+
+	if (c->x86 == 0x15) {
+		unsigned long upperbit;
+		u32 cpuid, assoc;
+
+		cpuid	 = cpuid_edx(0x80000005);
+		assoc	 = cpuid >> 16 & 0xff;
+		upperbit = ((cpuid >> 24) << 10) / assoc;
+
+		va_align.mask	  = (upperbit - 1) & PAGE_MASK;
+		va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;
+
+	}
 }
 
 static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index ff14a50..aaa8d09 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -18,6 +18,72 @@
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
 
+struct __read_mostly va_alignment va_align = {
+	.flags = -1,
+};
+
+/*
+ * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
+ *
+ * @flags denotes the allocation direction - bottomup or topdown -
+ * or vDSO; see call sites below.
+ */
+unsigned long align_addr(unsigned long addr, struct file *filp,
+			 enum align_flags flags)
+{
+	unsigned long tmp_addr;
+
+	/* handle 32- and 64-bit case with a single conditional */
+	if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
+		return addr;
+
+	if (!(current->flags & PF_RANDOMIZE))
+		return addr;
+
+	if (!((flags & ALIGN_VDSO) || filp))
+		return addr;
+
+	tmp_addr = addr;
+
+	/*
+	 * We need an address which is <= than the original
+	 * one only when in topdown direction.
+	 */
+	if (!(flags & ALIGN_TOPDOWN))
+		tmp_addr += va_align.mask;
+
+	tmp_addr &= ~va_align.mask;
+
+	return tmp_addr;
+}
+
+static int __init control_va_addr_alignment(char *str)
+{
+	/* guard against enabling this on other CPU families */
+	if (va_align.flags < 0)
+		return 1;
+
+	if (*str == 0)
+		return 1;
+
+	if (*str == '=')
+		str++;
+
+	if (!strcmp(str, "32"))
+		va_align.flags = ALIGN_VA_32;
+	else if (!strcmp(str, "64"))
+		va_align.flags = ALIGN_VA_64;
+	else if (!strcmp(str, "off"))
+		va_align.flags = 0;
+	else if (!strcmp(str, "on"))
+		va_align.flags = ALIGN_VA_32 | ALIGN_VA_64;
+	else
+		return 0;
+
+	return 1;
+}
+__setup("align_va_addr", control_va_addr_alignment);
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		unsigned long, prot, unsigned long, flags,
 		unsigned long, fd, unsigned long, off)
@@ -92,6 +158,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	start_addr = addr;
 
 full_search:
+
+	addr = align_addr(addr, filp, 0);
+
 	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
 		/* At this point:  (!vma || addr < vma->vm_end). */
 		if (end - len < addr) {
@@ -117,6 +186,7 @@ full_search:
 			mm->cached_hole_size = vma->vm_start - addr;
 
 		addr = vma->vm_end;
+		addr = align_addr(addr, filp, 0);
 	}
 }
 
@@ -161,10 +231,13 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* make sure it can fit in the remaining address space */
 	if (addr > len) {
-		vma = find_vma(mm, addr-len);
-		if (!vma || addr <= vma->vm_start)
+		unsigned long tmp_addr = align_addr(addr - len, filp,
+						    ALIGN_TOPDOWN);
+
+		vma = find_vma(mm, tmp_addr);
+		if (!vma || tmp_addr + len <= vma->vm_start)
 			/* remember the address as a hint for next time */
-			return mm->free_area_cache = addr-len;
+			return mm->free_area_cache = tmp_addr;
 	}
 
 	if (mm->mmap_base < len)
@@ -173,6 +246,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	addr = mm->mmap_base-len;
 
 	do {
+		addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+
 		/*
 		 * Lookup failure means no vma is above this address,
 		 * else if new region fits below vma->vm_start,
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1dab519..d4c0736 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -51,21 +51,6 @@ static unsigned int stack_maxrandom_size(void)
 #define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
 #define MAX_GAP (TASK_SIZE/6*5)
 
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static int mmap_is_ia32(void)
-{
-#ifdef CONFIG_X86_32
-	return 1;
-#endif
-#ifdef CONFIG_IA32_EMULATION
-	if (test_thread_flag(TIF_IA32))
-		return 1;
-#endif
-	return 0;
-}
-
 static int mmap_is_legacy(void)
 {
 	if (current->personality & ADDR_COMPAT_LAYOUT)
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 7abd2be..caa42ce 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -69,6 +69,15 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 	addr = start + (offset << PAGE_SHIFT);
 	if (addr >= end)
 		addr = end;
+
+	/*
+	 * page-align it here so that get_unmapped_area doesn't
+	 * align it wrongfully again to the next page. addr can come in 4K
+	 * unaligned here as a result of stack start randomization.
+	 */
+	addr = PAGE_ALIGN(addr);
+	addr = align_addr(addr, NULL, ALIGN_VDSO);
+
 	return addr;
 }
 
-- 
1.7.8.rc0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-12-02 23:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-04 11:22 [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Borislav Petkov
2011-11-04 11:26 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov
2011-11-04 15:25   ` Greg KH
2011-11-16 23:21     ` Greg KH
2011-11-17 11:48       ` Borislav Petkov
2011-11-18 17:54       ` H. Peter Anvin
2011-11-28  0:01         ` Greg KH
2011-12-02 23:45           ` Greg KH
2011-11-04 11:26 ` [PATCH 2/4] x86: Add a BSP cpu_dev helper Borislav Petkov
2011-11-04 11:26 ` [PATCH 3/4] x86, amd: Move BSP code to " Borislav Petkov
2011-11-04 11:26 ` [PATCH 4/4] x86-32, amd: Move va_align definition to unbreak 32-bit build Borislav Petkov
2011-11-04 15:23 ` [GIT PULL 3.1.x-stable] AMD F15h cache aliasing fixes Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2011-11-04 11:11 [GIT PULL 3.0.x-stable] " Borislav Petkov
2011-11-04 11:15 ` [PATCH 1/4] x86, amd: Avoid cache aliasing penalties on AMD family 15h Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).