All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <greg@kroah.com>
To: Joerg Roedel <joerg.roedel@amd.com>
Cc: Greg KH <gregkh@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@amd64.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Erratum 383 fix for 32 bit x86 kernels
Date: Fri, 24 Sep 2010 09:02:06 -0700	[thread overview]
Message-ID: <20100924160205.GB21235@kroah.com> (raw)
In-Reply-To: <20100924115239.GA9817@amd.com>

On Fri, Sep 24, 2010 at 01:52:40PM +0200, Joerg Roedel wrote:
> Hi Greg,
> 
> here is a patch which I want to ask you to include into the current
> -stable kernels. This patch fixes the occurence of AMD Erratum 383 on
> 32 bit x86 kernels when using CPU hotplug. This patch is a combined
> patch created by cherry-picking (and fixing a small compile error)
> upstream commits:
> 
> 	fd89a137924e0710078c3ae855e7cec1c43cb845
> 	b7d460897739e02f186425b7276e3fdb1595cea7
> 
> The second commit fixes a problem in the first one. The patch I attach
> here is against 2.6.32.22 but should also apply against 2.6.35 (At least
> cherry-picking from 2.6.36-rc gave no conflicts).
> Please consider to include this patch into the -stable kernel series.
> 
> Regards,
> 
> 	Joerg
> 
> >From d12a669119908f1c24a3b5037445c124c312eea5 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <joerg.roedel@amd.com>
> Date: Thu, 22 Jul 2010 11:32:00 +0200
> Subject: [PATCH] x86-32: Fix crashes with CPU hotplug on AMD machines
> 
> This patch fixes machine crashes which occured when heavily
> testing cpu hotplug code on a 32 bit kernel. The kernel
> crashed because these tests triggered AMD erratum 383 which
> resulted in a machine check exception.
> The erratum triggered in the test cases because of the
> following steps:
> 
> 	1. On 32 bit the swapper_pg_dir page table is used
> 	   as the initial page table for booting a secondary
> 	   cpu.
> 
> 	2. To make this work swapper_pg_dir needs a direct
> 	   mapping of physical memory in it (the low
> 	   mappings)
> 
> 	3. Other cpus may use swapper_pg_dir while the low
> 	   mappings are present (when leave_mm is called).
> 
> 	4. This can result in TLB entries for addresses
> 	   below __PAGE_OFFSET to be established due to
> 	   speculative TLB loads. These TLB entries are
> 	   marked global and possibly large.
> 
> 	5. When the CPU which has this entry loaded switches
> 	   to another page table this global TLB entry is not
> 	   flushed.
> 
> 	6. The process accesses an address covered by this
> 	   TLB entry but there is a permission mismatch
> 	   (present TLB entry covers a large global page not
> 	    accessible for userspace).
> 
> 	7. Due to the permission mismatch the hardware
> 	   walks the new page table and establishes a new
> 	   4kb TLB entry. Due to AMD erratum 383 there might
> 	   be a small window of time now where both TLB
> 	   entries are present.
> 
> 	8. After the page walk the hardware does a new TLB
> 	   lookup which may result in two matches. This
> 	   results in a machine check exception which
> 	   signals a TLB multimatch error. The machine
> 	   crashes.
> 
> There are two ways to fix this issue:
> 
> 	1. Always do a global TLB flush when a new cr3 is
> 	   loaded and the old page table was swapper_pg_dir.
> 	   I consider this a hack hard to understand.
> 
> 	2. Do not use swapper_pg_dir to boot secondary cpus.
> 
> This patch implements solution 2. It introduces a
> trampoline_pg_dir which has the same layout as
> swapper_pg_dir with low_mappings. This page table is used as
> the initial page table of the booting cpu. Later in the
> bringup process it switches to swapper_pg_dir and does a
> global tlb flush. This fixes the crashes in our test cases.
> 
> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
> ---
>  arch/x86/include/asm/pgtable_32.h |    1 +
>  arch/x86/include/asm/trampoline.h |    3 +++
>  arch/x86/kernel/head_32.S         |    8 +++++++-
>  arch/x86/kernel/setup.c           |    2 ++
>  arch/x86/kernel/smpboot.c         |   21 ++++++---------------
>  arch/x86/kernel/trampoline.c      |   18 ++++++++++++++++++
>  6 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
> index 2984a25..f686f49 100644
> --- a/arch/x86/include/asm/pgtable_32.h
> +++ b/arch/x86/include/asm/pgtable_32.h
> @@ -26,6 +26,7 @@ struct mm_struct;
>  struct vm_area_struct;
>  
>  extern pgd_t swapper_pg_dir[1024];
> +extern pgd_t trampoline_pg_dir[1024];
>  
>  static inline void pgtable_cache_init(void) { }
>  static inline void check_pgt_cache(void) { }
> diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
> index cb507bb..8f78fdf 100644
> --- a/arch/x86/include/asm/trampoline.h
> +++ b/arch/x86/include/asm/trampoline.h
> @@ -13,14 +13,17 @@ extern unsigned char *trampoline_base;
>  
>  extern unsigned long init_rsp;
>  extern unsigned long initial_code;
> +extern unsigned long initial_page_table;
>  extern unsigned long initial_gs;
>  
>  #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
>  
>  extern unsigned long setup_trampoline(void);
> +extern void __init setup_trampoline_page_table(void);
>  extern void __init reserve_trampoline_memory(void);
>  #else
>  static inline void reserve_trampoline_memory(void) {};
> +extern void __init setup_trampoline_page_table(void) {};
>  #endif /* CONFIG_X86_TRAMPOLINE */

I don't think that last setup_trampoline_page_table() line is correct
here.

Shouldn't it be:
	static inline void setup_trampoline_page_table(void) {};
instead?

Otherwise I get the following error building the .32 code with this
patch:
	  CC      arch/x86/kernel/setup.o
	  arch/x86/kernel/setup.c: In function ‘setup_arch’:
	  arch/x86/kernel/setup.c:1001:2: error: implicit declaration of function ‘setup_trampoline_page_table’

Is this really how the code looks upstream?

Hm, even with changing the function prototype, I still get an error
building on the .32-stable tree on x86-64, so I'm dropping this patch
from there.

Also, it didn't apply cleanly to .32-stable, I had to apply this chunk
by hand, no big deal.

So, why not I just take the original git commits that are in Linus's
tree?  That should work, right?  If so, do I just need to use those two
above-mentioned commits?  Or something else?  I prefer taking the
original commits as it makes spelunking over time much easier.

thanks,

greg k-h


>  
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
> index 37c3d4b..75e3981 100644
> --- a/arch/x86/kernel/head_32.S
> +++ b/arch/x86/kernel/head_32.S
> @@ -328,7 +328,7 @@ ENTRY(startup_32_smp)
>  /*
>   * Enable paging
>   */
> -	movl $pa(swapper_pg_dir),%eax
> +	movl pa(initial_page_table), %eax
>  	movl %eax,%cr3		/* set the page table pointer.. */
>  	movl %cr0,%eax
>  	orl  $X86_CR0_PG,%eax
> @@ -608,6 +608,8 @@ ignore_int:
>  .align 4
>  ENTRY(initial_code)
>  	.long i386_start_kernel
> +ENTRY(initial_page_table)
> +	.long pa(swapper_pg_dir)
>  
>  /*
>   * BSS section
> @@ -623,6 +625,10 @@ ENTRY(swapper_pg_dir)
>  #endif
>  swapper_pg_fixmap:
>  	.fill 1024,4,0
> +#ifdef CONFIG_X86_TRAMPOLINE
> +ENTRY(trampoline_pg_dir)
> +	.fill 1024,4,0
> +#endif
>  ENTRY(empty_zero_page)
>  	.fill 4096,1,0
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index b4ae4ac..6600cfd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1008,6 +1008,8 @@ void __init setup_arch(char **cmdline_p)
>  	paging_init();
>  	x86_init.paging.pagetable_setup_done(swapper_pg_dir);
>  
> +	setup_trampoline_page_table();
> +
>  	tboot_probe();
>  
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index c4f33b2..7b05eb1 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -73,7 +73,6 @@
>  
>  #ifdef CONFIG_X86_32
>  u8 apicid_2_node[MAX_APICID];
> -static int low_mappings;
>  #endif
>  
>  /* State of each CPU */
> @@ -300,8 +299,8 @@ notrace static void __cpuinit start_secondary(void *unused)
>  	}
>  
>  #ifdef CONFIG_X86_32
> -	while (low_mappings)
> -		cpu_relax();
> +	/* switch away from the trampoline page-table */
> +	load_cr3(swapper_pg_dir);
>  	__flush_tlb_all();
>  #endif
>  
> @@ -765,6 +764,10 @@ do_rest:
>  	initial_code = (unsigned long)start_secondary;
>  	stack_start.sp = (void *) c_idle.idle->thread.sp;
>  
> +#ifdef CONFIG_X86_32
> +	initial_page_table = __pa(&trampoline_pg_dir);
> +#endif
> +
>  	/* start_ip had better be page-aligned! */
>  	start_ip = setup_trampoline();
>  
> @@ -894,20 +897,8 @@ int __cpuinit native_cpu_up(unsigned int cpu)
>  
>  	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
>  
> -#ifdef CONFIG_X86_32
> -	/* init low mem mapping */
> -	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> -		min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> -	flush_tlb_all();
> -	low_mappings = 1;
> -
>  	err = do_boot_cpu(apicid, cpu);
>  
> -	zap_low_mappings(false);
> -	low_mappings = 0;
> -#else
> -	err = do_boot_cpu(apicid, cpu);
> -#endif
>  	if (err) {
>  		pr_debug("do_boot_cpu failed %d\n", err);
>  		return -EIO;
> diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
> index c652ef6..a874495 100644
> --- a/arch/x86/kernel/trampoline.c
> +++ b/arch/x86/kernel/trampoline.c
> @@ -1,6 +1,7 @@
>  #include <linux/io.h>
>  
>  #include <asm/trampoline.h>
> +#include <asm/pgtable.h>
>  #include <asm/e820.h>
>  
>  #if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
> @@ -37,3 +38,20 @@ unsigned long __trampinit setup_trampoline(void)
>  	memcpy(trampoline_base, trampoline_data, TRAMPOLINE_SIZE);
>  	return virt_to_phys(trampoline_base);
>  }
> +
> +void __init setup_trampoline_page_table(void)
> +{
> +#ifdef CONFIG_X86_32
> +	/* Copy kernel address range */
> +	clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
> +			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> +			min_t(unsigned long, KERNEL_PGD_PTRS,
> +			      KERNEL_PGD_BOUNDARY));
> +
> +	/* Initialize low mappings */
> +	clone_pgd_range(trampoline_pg_dir,
> +			swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> +			min_t(unsigned long, KERNEL_PGD_PTRS,
> +			      KERNEL_PGD_BOUNDARY));
> +#endif
> +}
> -- 
> 1.7.0.4
> 
> -- 
> AMD Operating System Research Center
> 
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
> General Managers: Alberto Bozzo, Andrew Bowd
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

  parent reply	other threads:[~2010-09-24 16:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-24 11:52 Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
2010-09-24 11:58 ` Joerg Roedel
2010-09-24 13:47 ` Greg KH
2010-09-24 13:53   ` Roedel, Joerg
2010-09-24 16:02 ` Greg KH [this message]
2010-09-24 16:24   ` Borislav Petkov
2010-09-24 16:29     ` Greg KH
2010-10-22 16:18     ` Greg KH
2010-10-22 16:20       ` Greg KH
2010-10-23  8:26         ` Borislav Petkov
2010-11-11 13:56         ` [PATCH 0/3] " Joerg Roedel
2010-11-11 13:56         ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
2011-01-19  0:39           ` Konrad Rzeszutek Wilk
2011-01-19  7:19             ` Borislav Petkov
2011-01-19 15:52               ` Konrad Rzeszutek Wilk
2010-11-11 13:56         ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
2010-11-11 14:11           ` Greg KH
2010-11-11 14:13             ` Greg KH
2010-11-11 14:17               ` Roedel, Joerg
2010-11-11 15:16               ` [PATCH 0/3] Erratum 383 fix for 32 bit x86 kernels Joerg Roedel
2010-11-11 15:16               ` [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir Joerg Roedel
2010-12-07 21:05                 ` Greg KH
2010-12-08  3:06                   ` Jeremy Fitzhardinge
2010-12-08  4:15                     ` Greg KH
2010-12-08  9:34                       ` Ian Campbell
2010-12-08 11:58                         ` Borislav Petkov
2010-12-08 15:21                           ` Jeremy Fitzhardinge
2010-11-11 15:16               ` [PATCH 2/3] x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline Joerg Roedel
2010-12-07 21:06                 ` [stable] " Greg KH
2010-11-11 15:16               ` [PATCH 3/3] x86-32: Fix dummy trampoline-related inline stubs Joerg Roedel
2010-12-07 21:07                 ` Greg KH
2010-11-11 13:56         ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100924160205.GB21235@kroah.com \
    --to=greg@kroah.com \
    --cc=bp@amd64.org \
    --cc=gregkh@suse.de \
    --cc=hpa@zytor.com \
    --cc=joerg.roedel@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.