* init and zap low address mappings on demand for cpu hotplug
@ 2005-09-21 20:57 Ashok Raj
2005-09-21 21:11 ` Andrew Morton
2005-09-22 9:48 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: Ashok Raj @ 2005-09-21 20:57 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm, ak, suresh.b.siddha, discuss
Hi,
to simplyfy cpu hotplug we didnt zap low mem address since we would require
them post boot to bringup a new cpu. This caused bad effects when
Suresh was testing some new code. More below.
Andrew: Please consider for -mm
--
Cheers,
Ashok Raj
- Open Source Technology Center
We need to dynamically map and unmap the low address mappings on demand.
Originally we left these low address mappings since they were required for cpu
hotplug. Due to the following sigthting, we are now doing this
dynamically on demand just before we bringup a cpu and then zap when the
secondary boot is complete. We defer the zap during first boot until smpboot
process is complete.
>From Suresh:
Identify mapped low address mappings cause corruption when udev process is
spawned early in boot. Since these low mappings are mapped global, cr3 writes
during context switches to udev process was corrupting memory.
Thanks to suresh for identifying and help comming up with the fix.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Suresh Siddha <suresh.siddha@intel.com>
---------------------------------------------------
arch/x86_64/kernel/smpboot.c | 11 ++++++++---
arch/x86_64/mm/init.c | 24 +++++++++++++++++++++---
include/asm-x86_64/smp.h | 3 ++-
3 files changed, 31 insertions(+), 7 deletions(-)
Index: linux-2.6.14-rc1-mm1/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.14-rc1-mm1.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.14-rc1-mm1/arch/x86_64/kernel/smpboot.c
@@ -748,6 +748,12 @@ do_rest:
cpu_pda[cpu].pcurrent = c_idle.idle;
+ /*
+ * Re-establish low mappings to facilitate boot.
+ * zap it back when boot process is complete.
+ */
+ init_low_mappings();
+
start_rip = setup_trampoline();
init_rsp = c_idle.idle->thread.rsp;
@@ -829,6 +835,7 @@ do_rest:
#endif
}
}
+ zap_low_mappings(0);
if (boot_error) {
cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */
clear_bit(cpu, &cpu_initialized); /* was set by cpu_init() */
@@ -1067,9 +1074,7 @@ int __cpuinit __cpu_up(unsigned int cpu)
*/
void __init smp_cpus_done(unsigned int max_cpus)
{
-#ifndef CONFIG_HOTPLUG_CPU
- zap_low_mappings();
-#endif
+ zap_low_mappings(1);
smp_cleanup_boot();
#ifdef CONFIG_X86_IO_APIC
Index: linux-2.6.14-rc1-mm1/arch/x86_64/mm/init.c
===================================================================
--- linux-2.6.14-rc1-mm1.orig/arch/x86_64/mm/init.c
+++ linux-2.6.14-rc1-mm1/arch/x86_64/mm/init.c
@@ -311,11 +311,29 @@ void __init init_memory_mapping(unsigned
}
extern struct x8664_pda cpu_pda[NR_CPUS];
+__cpuinitdata static int zap_low_first_time = 1;
/* Assumes all CPUs still execute in init_mm */
-void zap_low_mappings(void)
+__cpuinit void init_low_mappings(void)
{
- pgd_t *pgd = pgd_offset_k(0UL);
+ if (!zap_low_first_time)
+ set_pgd(pgd_offset_k(0UL), *pgd_offset_k(PAGE_OFFSET));
+}
+
+/*
+ * mode
+ * 0 indicates its for __cpu_up to kick an AP into boot sequence.
+ * 1 indicates completion os smp boot process, so we can zap the low
+ * until there is need to bring a cpu up again.
+ */
+__cpuinit void zap_low_mappings(int mode)
+{
+ pgd_t *pgd;
+ if (!mode && zap_low_first_time)
+ return;
+
+ pgd = pgd_offset_k(0UL);
+ zap_low_first_time = 0;
pgd_clear(pgd);
flush_tlb_all();
}
@@ -481,7 +499,7 @@ void __init mem_init(void)
* the WP-bit has been tested.
*/
#ifndef CONFIG_SMP
- zap_low_mappings();
+ zap_low_mappings(1);
#endif
}
Index: linux-2.6.14-rc1-mm1/include/asm-x86_64/smp.h
===================================================================
--- linux-2.6.14-rc1-mm1.orig/include/asm-x86_64/smp.h
+++ linux-2.6.14-rc1-mm1/include/asm-x86_64/smp.h
@@ -47,7 +47,8 @@ extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
extern void smp_send_reschedule(int cpu);
-extern void zap_low_mappings(void);
+extern void zap_low_mappings(int mode);
+extern void init_low_mappings(void);
void smp_stop_cpu(void);
extern int smp_call_function_single(int cpuid, void (*func) (void *info),
void *info, int retry, int wait);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-21 20:57 init and zap low address mappings on demand for cpu hotplug Ashok Raj
@ 2005-09-21 21:11 ` Andrew Morton
2005-09-21 22:39 ` Ashok Raj
2005-09-22 9:48 ` Andi Kleen
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2005-09-21 21:11 UTC (permalink / raw)
To: Ashok Raj; +Cc: linux-kernel, ak, suresh.b.siddha, discuss
Ashok Raj <ashok.raj@intel.com> wrote:
>
> +/*
> + * mode
> + * 0 indicates its for __cpu_up to kick an AP into boot sequence.
> + * 1 indicates completion os smp boot process, so we can zap the low
> + * until there is need to bring a cpu up again.
> + */
> +__cpuinit void zap_low_mappings(int mode)
grump. `mode' is a terrible identifier name. Better to call it something
which identifies its meaning if true, such as `do_mapping_zapping' or
something.
Even better, implement two nicely-named functions rather than passing in a
`mode' argument. Those functions can of course share a common
mode-selected implementation internally.
And that comment is incomprehensible, partly due to all its typos. Care to
try again?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-21 21:11 ` Andrew Morton
@ 2005-09-21 22:39 ` Ashok Raj
0 siblings, 0 replies; 7+ messages in thread
From: Ashok Raj @ 2005-09-21 22:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ashok Raj, linux-kernel, ak, suresh.b.siddha, discuss
On Wed, Sep 21, 2005 at 02:11:57PM -0700, Andrew Morton wrote:
> Ashok Raj <ashok.raj@intel.com> wrote:
> >
> > +/*
> > + * mode
> > + * 0 indicates its for __cpu_up to kick an AP into boot sequence.
> > + * 1 indicates completion os smp boot process, so we can zap the low
> > + * until there is need to bring a cpu up again.
> > + */
> > +__cpuinit void zap_low_mappings(int mode)
>
> grump. `mode' is a terrible identifier name. Better to call it something
> which identifies its meaning if true, such as `do_mapping_zapping' or
> something.
>
Valid Grump!
>
> And that comment is incomprehensible, partly due to all its typos. Care to
> try again?
>
Something has to do about zap and corruption in this patch :-), infact i
accidently deleted some of the text from Suresh when i sent earlier.
I removed the param, made the var have a more meaningful name.
Below are re-worked patches.
--
Cheers,
Ashok Raj
- Open Source Technology Center
We need to dynamically map and unmap the low address mappings on demand.
Originally we left these low address mappings unzapped since they were
required later for cpu hotplug. Due to the following sigthting, we are now
doing this dynamically on demand just before we bringup a cpu and then zap
when the secondary boot is complete. We defer the zap during first boot
until smpboot process is complete.
>From Suresh:
Identity mapped low address mappings cause corruption when udev process is
spawned early in boot. Since these low mappings are mapped global, cr3 writes
(during context switches to udev process) will not flush these identity low
mappings. These low mappings will be used by the kernel when it writes into
the udev/hotplug user space, thereby corrupting kernel memory.
Thanks to suresh for identifying and help comming up with the fix.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Suresh B Siddha <suresh.b.siddha@intel.com>
---------------------------------------------------
arch/x86_64/kernel/smpboot.c | 11 ++++++++---
arch/x86_64/mm/init.c | 24 +++++++++++++++++++++---
include/asm-x86_64/smp.h | 2 ++
3 files changed, 31 insertions(+), 6 deletions(-)
Index: linux-2.6.14-rc1-mm1/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.14-rc1-mm1.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.14-rc1-mm1/arch/x86_64/kernel/smpboot.c
@@ -748,6 +748,12 @@ do_rest:
cpu_pda[cpu].pcurrent = c_idle.idle;
+ /*
+ * Re-establish low mappings to facilitate boot.
+ * zap it back when boot process is complete.
+ */
+ init_low_mappings();
+
start_rip = setup_trampoline();
init_rsp = c_idle.idle->thread.rsp;
@@ -829,6 +835,7 @@ do_rest:
#endif
}
}
+ zap_low_mappings();
if (boot_error) {
cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */
clear_bit(cpu, &cpu_initialized); /* was set by cpu_init() */
@@ -1067,9 +1074,7 @@ int __cpuinit __cpu_up(unsigned int cpu)
*/
void __init smp_cpus_done(unsigned int max_cpus)
{
-#ifndef CONFIG_HOTPLUG_CPU
- zap_low_mappings();
-#endif
+ zap_low_after_boot();
smp_cleanup_boot();
#ifdef CONFIG_X86_IO_APIC
Index: linux-2.6.14-rc1-mm1/arch/x86_64/mm/init.c
===================================================================
--- linux-2.6.14-rc1-mm1.orig/arch/x86_64/mm/init.c
+++ linux-2.6.14-rc1-mm1/arch/x86_64/mm/init.c
@@ -311,15 +311,33 @@ void __init init_memory_mapping(unsigned
}
extern struct x8664_pda cpu_pda[NR_CPUS];
+__cpuinitdata static int low_mem_valid = 1;
/* Assumes all CPUs still execute in init_mm */
-void zap_low_mappings(void)
+__cpuinit void init_low_mappings(void)
{
- pgd_t *pgd = pgd_offset_k(0UL);
+ if (!low_mem_valid)
+ set_pgd(pgd_offset_k(0UL), *pgd_offset_k(PAGE_OFFSET));
+}
+
+__cpuinit void zap_low_mappings(void)
+{
+ pgd_t *pgd;
+
+ if (low_mem_valid)
+ return;
+
+ pgd = pgd_offset_k(0UL);
pgd_clear(pgd);
flush_tlb_all();
}
+__cpuinit void zap_low_after_boot(void)
+{
+ low_mem_valid = 0;
+ zap_low_mappings();
+}
+
/* Compute zone sizes for the DMA and DMA32 zones in a node. */
__init void
size_zones(unsigned long *z, unsigned long *h,
@@ -481,7 +499,7 @@ void __init mem_init(void)
* the WP-bit has been tested.
*/
#ifndef CONFIG_SMP
- zap_low_mappings();
+ zap_low_after_boot();
#endif
}
Index: linux-2.6.14-rc1-mm1/include/asm-x86_64/smp.h
===================================================================
--- linux-2.6.14-rc1-mm1.orig/include/asm-x86_64/smp.h
+++ linux-2.6.14-rc1-mm1/include/asm-x86_64/smp.h
@@ -47,7 +47,9 @@ extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
extern void smp_send_reschedule(int cpu);
+extern void init_low_mappings(void);
extern void zap_low_mappings(void);
+extern void zap_low_after_boot(void);
void smp_stop_cpu(void);
extern int smp_call_function_single(int cpuid, void (*func) (void *info),
void *info, int retry, int wait);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-21 20:57 init and zap low address mappings on demand for cpu hotplug Ashok Raj
2005-09-21 21:11 ` Andrew Morton
@ 2005-09-22 9:48 ` Andi Kleen
2005-09-24 0:28 ` Siddha, Suresh B
1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2005-09-22 9:48 UTC (permalink / raw)
To: Ashok Raj; +Cc: linux-kernel, akpm, suresh.b.siddha, discuss
On Wed, Sep 21, 2005 at 01:57:31PM -0700, Ashok Raj wrote:
> Hi,
>
> to simplyfy cpu hotplug we didnt zap low mem address since we would require
> them post boot to bringup a new cpu. This caused bad effects when
> Suresh was testing some new code. More below.
This seems racy - how do you prevent udev running on another
CPU while another CPU boots? I suspect you need additional locks
to plug this race. Or use a fresh mm cloned from init_mm mm to do the
CPU bootup.
I don't like zap_low_first_time - it shouldn't be needed. In general
people have been complaining on i386 and x86-64 that we don't
unmap NULL early, so we don't catch bugs that happen on other
architectures.
Using a fresh mm for smp bootup would solve this nicely - one could
zap init_mm really early after entering from head.S and then
only ever undo it in private mms for smp bootup.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-22 9:48 ` Andi Kleen
@ 2005-09-24 0:28 ` Siddha, Suresh B
2005-09-26 6:58 ` Andi Kleen
0 siblings, 1 reply; 7+ messages in thread
From: Siddha, Suresh B @ 2005-09-24 0:28 UTC (permalink / raw)
To: Andi Kleen; +Cc: Ashok Raj, linux-kernel, akpm, suresh.b.siddha, discuss
On Thu, Sep 22, 2005 at 11:48:18AM +0200, Andi Kleen wrote:
> On Wed, Sep 21, 2005 at 01:57:31PM -0700, Ashok Raj wrote:
> > Hi,
> >
> > to simplyfy cpu hotplug we didnt zap low mem address since we would require
> > them post boot to bringup a new cpu. This caused bad effects when
> > Suresh was testing some new code. More below.
>
> This seems racy - how do you prevent udev running on another
> CPU while another CPU boots? I suspect you need additional locks
> to plug this race. Or use a fresh mm cloned from init_mm mm to do the
> CPU bootup.
>
> I don't like zap_low_first_time - it shouldn't be needed. In general
> people have been complaining on i386 and x86-64 that we don't
> unmap NULL early, so we don't catch bugs that happen on other
> architectures.
>
> Using a fresh mm for smp bootup would solve this nicely - one could
> zap init_mm really early after entering from head.S and then
> only ever undo it in private mms for smp bootup.
One of my initial recommendation to fix the race was to use non-global mappings
for low identity mappings. As you also bringup the issue of zapping low mappings
very early, how about the appended patch? If it is Ok with you, I will
do a similar one for i386.
--
low address mappings are not zapped after boot as they are required later for
cpu hotplug. These identity mapped low address mappings cause corruption
when udev process is spawned early in boot. Since these low mappings
are mapped global, cr3 writes (during context switches to udev process)
will not flush these identity low mappings. These low mappings will be
used by the kernel when it writes into the udev/hotplug user space,
thereby corrupting kernel memory.
Andi Kleen also brought up another point. We should zap these low mappings,
as soon as possible, so that we can catch kernel bugs more effectively.
This patch introduces boot_level4_pgt, which will always have low identity
addresses mapped. Druing boot, all the processors will use this as their
level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter
C code and zap the low mappings as soon as we are done with the usage of
identity low mapped addresses. On AP's we will zap the low mappings as
soon as we jump to C code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
diff -Npru linux-2.6.14-rc1/arch/i386/kernel/acpi/boot.c linux-2.6.14-rc1.hot/arch/i386/kernel/acpi/boot.c
--- linux-2.6.14-rc1/arch/i386/kernel/acpi/boot.c 2005-09-23 16:14:40.326091080 -0700
+++ linux-2.6.14-rc1.hot/arch/i386/kernel/acpi/boot.c 2005-09-23 10:31:28.105584432 -0700
@@ -544,7 +544,7 @@ acpi_scan_rsdp(unsigned long start, unsi
* RSDP signature.
*/
for (offset = 0; offset < length; offset += 16) {
- if (strncmp((char *)(start + offset), "RSD PTR ", sig_len))
+ if (strncmp((char *)(__va(start) + offset), "RSD PTR ", sig_len))
continue;
return (start + offset);
}
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/head.S linux-2.6.14-rc1.hot/arch/x86_64/kernel/head.S
--- linux-2.6.14-rc1/arch/x86_64/kernel/head.S 2005-09-23 16:14:40.327090928 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/head.S 2005-09-22 19:09:36.995151320 -0700
@@ -70,7 +70,7 @@ startup_32:
movl %eax, %cr4
/* Setup early boot stage 4 level pagetables */
- movl $(init_level4_pgt - __START_KERNEL_map), %eax
+ movl $(boot_level4_pgt - __START_KERNEL_map), %eax
movl %eax, %cr3
/* Setup EFER (Extended Feature Enable Register) */
@@ -113,7 +113,7 @@ startup_64:
movq %rax, %cr4
/* Setup early boot stage 4 level pagetables. */
- movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ movq $(boot_level4_pgt - __START_KERNEL_map), %rax
movq %rax, %cr3
/* Check if nx is implemented */
@@ -247,7 +247,7 @@ ENTRY(_stext)
* 2Mbyte large pages provided by PAE mode)
*/
.org 0x1000
-ENTRY(init_level4_pgt)
+ENTRY(boot_level4_pgt)
.quad 0x0000000000002007 + __PHYSICAL_START /* -> level3_ident_pgt */
.fill 255,8,0
.quad 0x000000000000a007 + __PHYSICAL_START
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/head64.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/head64.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/head64.c 2005-09-23 16:14:40.327090928 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/head64.c 2005-09-23 14:52:50.576486472 -0700
@@ -19,6 +19,7 @@
#include <asm/bootsetup.h>
#include <asm/setup.h>
#include <asm/desc.h>
+#include <asm/pgtable.h>
/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
@@ -76,6 +77,7 @@ static void __init setup_boot_cpu_data(v
}
extern char _end[];
+pgd_t init_level4_pgt[PTRS_PER_PGD];
void __init x86_64_start_kernel(char * real_mode_data)
{
@@ -86,6 +88,13 @@ void __init x86_64_start_kernel(char * r
set_intr_gate(i, early_idt_handler);
asm volatile("lidt %0" :: "m" (idt_descr));
clear_bss();
+
+ /*
+ * switch to init_level4_pgt from boot_level4_pgt
+ */
+ memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));
+ asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
+
pda_init(0);
copy_bootdata(real_mode_data);
#ifdef CONFIG_SMP
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/setup.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/setup.c 2005-09-23 16:14:40.328090776 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup.c 2005-09-23 16:11:14.567371160 -0700
@@ -571,6 +571,8 @@ void __init setup_arch(char **cmdline_p)
init_memory_mapping(0, (end_pfn_map << PAGE_SHIFT));
+ zap_low_mappings(0);
+
#ifdef CONFIG_ACPI
/*
* Initialize the ACPI boot-time table parser (gets the RSDP and SDT).
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/setup64.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup64.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/setup64.c 2005-09-23 16:14:40.328090776 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup64.c 2005-09-23 11:21:58.385911800 -0700
@@ -137,7 +137,6 @@ void pda_init(int cpu)
panic("cannot allocate irqstack for cpu %d", cpu);
}
- asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
pda->irqstackptr += IRQSTACKSIZE-64;
}
@@ -193,6 +192,7 @@ void __cpuinit cpu_init (void)
/* CPU 0 is initialised in head64.c */
if (cpu != 0) {
pda_init(cpu);
+ zap_low_mappings(cpu);
} else
estacks = boot_exception_stacks;
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/smpboot.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/smpboot.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/smpboot.c 2005-09-23 16:14:40.329090624 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/smpboot.c 2005-09-23 11:13:10.729127824 -0700
@@ -1067,9 +1067,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
*/
void __init smp_cpus_done(unsigned int max_cpus)
{
-#ifndef CONFIG_HOTPLUG_CPU
- zap_low_mappings();
-#endif
smp_cleanup_boot();
#ifdef CONFIG_X86_IO_APIC
diff -Npru linux-2.6.14-rc1/arch/x86_64/mm/init.c linux-2.6.14-rc1.hot/arch/x86_64/mm/init.c
--- linux-2.6.14-rc1/arch/x86_64/mm/init.c 2005-09-23 16:14:40.329090624 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/mm/init.c 2005-09-23 16:45:41.477153112 -0700
@@ -310,12 +310,19 @@ void __init init_memory_mapping(unsigned
extern struct x8664_pda cpu_pda[NR_CPUS];
-/* Assumes all CPUs still execute in init_mm */
-void zap_low_mappings(void)
+void __cpuinit zap_low_mappings(int cpu)
{
- pgd_t *pgd = pgd_offset_k(0UL);
- pgd_clear(pgd);
- flush_tlb_all();
+ if (cpu == 0) {
+ pgd_t *pgd = pgd_offset_k(0UL);
+ pgd_clear(pgd);
+ } else {
+ /*
+ * For AP's, zap the low identity mappings by changing the cr3
+ * to init_level4_pgt and doing local flush tlb all
+ */
+ asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
+ }
+ __flush_tlb_all();
}
#ifndef CONFIG_NUMA
@@ -438,14 +445,13 @@ void __init mem_init(void)
datasize >> 10,
initsize >> 10);
+#ifdef CONFIG_SMP
/*
- * Subtle. SMP is doing its boot stuff late (because it has to
- * fork idle threads) - but it also needs low mappings for the
- * protected-mode entry to work. We zap these entries only after
- * the WP-bit has been tested.
+ * Sync boot_level4_pgt mappings with the init_level4_pgt
+ * except for the low identity mappings which are already zapped
+ * in init_level4_pgt. This sync-up is essential for AP's bringup
*/
-#ifndef CONFIG_SMP
- zap_low_mappings();
+ memcpy(boot_level4_pgt+1, init_level4_pgt+1, (PTRS_PER_PGD-1)*sizeof(pgd_t));
#endif
}
diff -Npru linux-2.6.14-rc1/include/asm-x86_64/pgtable.h linux-2.6.14-rc1.hot/include/asm-x86_64/pgtable.h
--- linux-2.6.14-rc1/include/asm-x86_64/pgtable.h 2005-09-23 16:14:40.330090472 -0700
+++ linux-2.6.14-rc1.hot/include/asm-x86_64/pgtable.h 2005-09-22 12:10:36.673061208 -0700
@@ -16,6 +16,7 @@ extern pud_t level3_physmem_pgt[512];
extern pud_t level3_ident_pgt[512];
extern pmd_t level2_kernel_pgt[512];
extern pgd_t init_level4_pgt[];
+extern pgd_t boot_level4_pgt[];
extern unsigned long __supported_pte_mask;
#define swapper_pg_dir init_level4_pgt
diff -Npru linux-2.6.14-rc1/include/asm-x86_64/smp.h linux-2.6.14-rc1.hot/include/asm-x86_64/smp.h
--- linux-2.6.14-rc1/include/asm-x86_64/smp.h 2005-09-23 16:14:40.330090472 -0700
+++ linux-2.6.14-rc1.hot/include/asm-x86_64/smp.h 2005-09-23 11:22:55.218271968 -0700
@@ -47,7 +47,7 @@ extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
extern void smp_send_reschedule(int cpu);
-extern void zap_low_mappings(void);
+extern void zap_low_mappings(int cpu);
void smp_stop_cpu(void);
extern int smp_call_function_single(int cpuid, void (*func) (void *info),
void *info, int retry, int wait);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-24 0:28 ` Siddha, Suresh B
@ 2005-09-26 6:58 ` Andi Kleen
2005-09-26 23:19 ` Siddha, Suresh B
0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2005-09-26 6:58 UTC (permalink / raw)
To: Siddha, Suresh B; +Cc: Ashok Raj, linux-kernel, akpm, discuss
> This patch introduces boot_level4_pgt, which will always have low identity
> addresses mapped. Druing boot, all the processors will use this as their
> level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter
> C code and zap the low mappings as soon as we are done with the usage of
> identity low mapped addresses. On AP's we will zap the low mappings as
> soon as we jump to C code.
Looks good. Thanks Suresh. The boot page tables should be marked __initdata
now, but that can be done in an follow on patch.
i386 should probably get similar treatment.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: init and zap low address mappings on demand for cpu hotplug
2005-09-26 6:58 ` Andi Kleen
@ 2005-09-26 23:19 ` Siddha, Suresh B
0 siblings, 0 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2005-09-26 23:19 UTC (permalink / raw)
To: Andi Kleen; +Cc: Siddha, Suresh B, Ashok Raj, linux-kernel, akpm, discuss
On Mon, Sep 26, 2005 at 08:58:56AM +0200, Andi Kleen wrote:
> Looks good. Thanks Suresh. The boot page tables should be marked __initdata
> now, but that can be done in an follow on patch.
Appended new complete patch. Andrew, please apply.
> i386 should probably get similar treatment.
Will do it sometime soon.
--
low address mappings are not zapped after boot as they are required later for
cpu hotplug. These identity mapped low address mappings cause corruption
when udev process is spawned early in boot. Since these low mappings
are mapped global, cr3 writes (during context switches to udev process)
will not flush these identity low mappings. These low mappings will be
used by the kernel when it writes into the udev/hotplug user space,
thereby corrupting kernel memory.
Andi Kleen also brought up another point. We should zap these low mappings,
as soon as possible, so that we can catch kernel bugs more effectively.
This patch introduces boot_level4_pgt, which will always have low identity
addresses mapped. Druing boot, all the processors will use this as their
level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter
C code and zap the low mappings as soon as we are done with the usage of
identity low mapped addresses. On AP's we will zap the low mappings as
soon as we jump to C code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
diff -Npru linux-2.6.14-rc1/arch/i386/kernel/acpi/boot.c linux-2.6.14-rc1.hot/arch/i386/kernel/acpi/boot.c
--- linux-2.6.14-rc1/arch/i386/kernel/acpi/boot.c 2005-09-23 16:14:40.326091080 -0700
+++ linux-2.6.14-rc1.hot/arch/i386/kernel/acpi/boot.c 2005-09-26 14:49:16.606625368 -0700
@@ -544,7 +544,7 @@ acpi_scan_rsdp(unsigned long start, unsi
* RSDP signature.
*/
for (offset = 0; offset < length; offset += 16) {
- if (strncmp((char *)(start + offset), "RSD PTR ", sig_len))
+ if (strncmp((char *)(__va(start) + offset), "RSD PTR ", sig_len))
continue;
return (start + offset);
}
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/head.S linux-2.6.14-rc1.hot/arch/x86_64/kernel/head.S
--- linux-2.6.14-rc1/arch/x86_64/kernel/head.S 2005-09-23 16:14:40.327090928 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/head.S 2005-09-26 15:30:20.645034592 -0700
@@ -12,6 +12,7 @@
#include <linux/linkage.h>
#include <linux/threads.h>
+#include <linux/init.h>
#include <asm/desc.h>
#include <asm/segment.h>
#include <asm/page.h>
@@ -70,7 +71,7 @@ startup_32:
movl %eax, %cr4
/* Setup early boot stage 4 level pagetables */
- movl $(init_level4_pgt - __START_KERNEL_map), %eax
+ movl $(boot_level4_pgt - __START_KERNEL_map), %eax
movl %eax, %cr3
/* Setup EFER (Extended Feature Enable Register) */
@@ -113,7 +114,7 @@ startup_64:
movq %rax, %cr4
/* Setup early boot stage 4 level pagetables. */
- movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ movq $(boot_level4_pgt - __START_KERNEL_map), %rax
movq %rax, %cr3
/* Check if nx is implemented */
@@ -240,20 +241,10 @@ ljumpvector:
ENTRY(stext)
ENTRY(_stext)
- /*
- * This default setting generates an ident mapping at address 0x100000
- * and a mapping for the kernel that precisely maps virtual address
- * 0xffffffff80000000 to physical address 0x000000. (always using
- * 2Mbyte large pages provided by PAE mode)
- */
.org 0x1000
ENTRY(init_level4_pgt)
- .quad 0x0000000000002007 + __PHYSICAL_START /* -> level3_ident_pgt */
- .fill 255,8,0
- .quad 0x000000000000a007 + __PHYSICAL_START
- .fill 254,8,0
- /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
- .quad 0x0000000000003007 + __PHYSICAL_START /* -> level3_kernel_pgt */
+ /* This gets initialized in x86_64_start_kernel */
+ .fill 512,8,0
.org 0x2000
ENTRY(level3_ident_pgt)
@@ -350,6 +341,24 @@ ENTRY(wakeup_level4_pgt)
.quad 0x0000000000003007 + __PHYSICAL_START /* -> level3_kernel_pgt */
#endif
+#ifndef CONFIG_HOTPLUG_CPU
+ __INITDATA
+#endif
+ /*
+ * This default setting generates an ident mapping at address 0x100000
+ * and a mapping for the kernel that precisely maps virtual address
+ * 0xffffffff80000000 to physical address 0x000000. (always using
+ * 2Mbyte large pages provided by PAE mode)
+ */
+ .align PAGE_SIZE
+ENTRY(boot_level4_pgt)
+ .quad 0x0000000000002007 + __PHYSICAL_START /* -> level3_ident_pgt */
+ .fill 255,8,0
+ .quad 0x000000000000a007 + __PHYSICAL_START
+ .fill 254,8,0
+ /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+ .quad 0x0000000000003007 + __PHYSICAL_START /* -> level3_kernel_pgt */
+
.data
.align 16
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/head64.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/head64.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/head64.c 2005-09-23 16:14:40.327090928 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/head64.c 2005-09-26 14:52:31.678969864 -0700
@@ -19,6 +19,7 @@
#include <asm/bootsetup.h>
#include <asm/setup.h>
#include <asm/desc.h>
+#include <asm/pgtable.h>
/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
@@ -86,6 +87,13 @@ void __init x86_64_start_kernel(char * r
set_intr_gate(i, early_idt_handler);
asm volatile("lidt %0" :: "m" (idt_descr));
clear_bss();
+
+ /*
+ * switch to init_level4_pgt from boot_level4_pgt
+ */
+ memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));
+ asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
+
pda_init(0);
copy_bootdata(real_mode_data);
#ifdef CONFIG_SMP
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/setup.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/setup.c 2005-09-23 16:14:40.328090776 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup.c 2005-09-26 14:49:16.607625216 -0700
@@ -571,6 +571,8 @@ void __init setup_arch(char **cmdline_p)
init_memory_mapping(0, (end_pfn_map << PAGE_SHIFT));
+ zap_low_mappings(0);
+
#ifdef CONFIG_ACPI
/*
* Initialize the ACPI boot-time table parser (gets the RSDP and SDT).
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/setup64.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup64.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/setup64.c 2005-09-23 16:14:40.328090776 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/setup64.c 2005-09-26 14:49:16.608625064 -0700
@@ -137,7 +137,6 @@ void pda_init(int cpu)
panic("cannot allocate irqstack for cpu %d", cpu);
}
- asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
pda->irqstackptr += IRQSTACKSIZE-64;
}
@@ -193,6 +192,7 @@ void __cpuinit cpu_init (void)
/* CPU 0 is initialised in head64.c */
if (cpu != 0) {
pda_init(cpu);
+ zap_low_mappings(cpu);
} else
estacks = boot_exception_stacks;
diff -Npru linux-2.6.14-rc1/arch/x86_64/kernel/smpboot.c linux-2.6.14-rc1.hot/arch/x86_64/kernel/smpboot.c
--- linux-2.6.14-rc1/arch/x86_64/kernel/smpboot.c 2005-09-23 16:14:40.329090624 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/kernel/smpboot.c 2005-09-26 14:49:16.608625064 -0700
@@ -1067,9 +1067,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
*/
void __init smp_cpus_done(unsigned int max_cpus)
{
-#ifndef CONFIG_HOTPLUG_CPU
- zap_low_mappings();
-#endif
smp_cleanup_boot();
#ifdef CONFIG_X86_IO_APIC
diff -Npru linux-2.6.14-rc1/arch/x86_64/mm/init.c linux-2.6.14-rc1.hot/arch/x86_64/mm/init.c
--- linux-2.6.14-rc1/arch/x86_64/mm/init.c 2005-09-23 16:14:40.329090624 -0700
+++ linux-2.6.14-rc1.hot/arch/x86_64/mm/init.c 2005-09-26 14:49:16.609624912 -0700
@@ -310,12 +310,19 @@ void __init init_memory_mapping(unsigned
extern struct x8664_pda cpu_pda[NR_CPUS];
-/* Assumes all CPUs still execute in init_mm */
-void zap_low_mappings(void)
+void __cpuinit zap_low_mappings(int cpu)
{
- pgd_t *pgd = pgd_offset_k(0UL);
- pgd_clear(pgd);
- flush_tlb_all();
+ if (cpu == 0) {
+ pgd_t *pgd = pgd_offset_k(0UL);
+ pgd_clear(pgd);
+ } else {
+ /*
+ * For AP's, zap the low identity mappings by changing the cr3
+ * to init_level4_pgt and doing local flush tlb all
+ */
+ asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
+ }
+ __flush_tlb_all();
}
#ifndef CONFIG_NUMA
@@ -438,14 +445,13 @@ void __init mem_init(void)
datasize >> 10,
initsize >> 10);
+#ifdef CONFIG_SMP
/*
- * Subtle. SMP is doing its boot stuff late (because it has to
- * fork idle threads) - but it also needs low mappings for the
- * protected-mode entry to work. We zap these entries only after
- * the WP-bit has been tested.
+ * Sync boot_level4_pgt mappings with the init_level4_pgt
+ * except for the low identity mappings which are already zapped
+ * in init_level4_pgt. This sync-up is essential for AP's bringup
*/
-#ifndef CONFIG_SMP
- zap_low_mappings();
+ memcpy(boot_level4_pgt+1, init_level4_pgt+1, (PTRS_PER_PGD-1)*sizeof(pgd_t));
#endif
}
diff -Npru linux-2.6.14-rc1/include/asm-x86_64/pgtable.h linux-2.6.14-rc1.hot/include/asm-x86_64/pgtable.h
--- linux-2.6.14-rc1/include/asm-x86_64/pgtable.h 2005-09-23 16:14:40.330090472 -0700
+++ linux-2.6.14-rc1.hot/include/asm-x86_64/pgtable.h 2005-09-26 14:49:16.609624912 -0700
@@ -16,6 +16,7 @@ extern pud_t level3_physmem_pgt[512];
extern pud_t level3_ident_pgt[512];
extern pmd_t level2_kernel_pgt[512];
extern pgd_t init_level4_pgt[];
+extern pgd_t boot_level4_pgt[];
extern unsigned long __supported_pte_mask;
#define swapper_pg_dir init_level4_pgt
diff -Npru linux-2.6.14-rc1/include/asm-x86_64/smp.h linux-2.6.14-rc1.hot/include/asm-x86_64/smp.h
--- linux-2.6.14-rc1/include/asm-x86_64/smp.h 2005-09-23 16:14:40.330090472 -0700
+++ linux-2.6.14-rc1.hot/include/asm-x86_64/smp.h 2005-09-26 14:49:16.610624760 -0700
@@ -47,7 +47,7 @@ extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
extern void smp_send_reschedule(int cpu);
-extern void zap_low_mappings(void);
+extern void zap_low_mappings(int cpu);
void smp_stop_cpu(void);
extern int smp_call_function_single(int cpuid, void (*func) (void *info),
void *info, int retry, int wait);
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-09-26 23:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-21 20:57 init and zap low address mappings on demand for cpu hotplug Ashok Raj
2005-09-21 21:11 ` Andrew Morton
2005-09-21 22:39 ` Ashok Raj
2005-09-22 9:48 ` Andi Kleen
2005-09-24 0:28 ` Siddha, Suresh B
2005-09-26 6:58 ` Andi Kleen
2005-09-26 23:19 ` Siddha, Suresh B
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox