* [PATCH 0/15] Improve x86 smpboot integration @ 2008-06-09 14:16 Glauber Costa 2008-06-09 14:16 ` [PATCH 01/15] x86: use stack_start in x86_64 Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh Ingo, First of all, sorry for being away for so long ;-) Here it goes a series of improvements for the x86 smpboot integration. The final goal is the same: To reduce the difference between architectures, just this time I do a little bit more ;-) Basically, this one reduces greatly the ifdef count: [root@t60 linux-2.6-x86]# git-show 674ce:arch/x86/kernel/smpboot.c \ | grep -e "^#ifdef \+CONFIG_X86_\(32\|64\)" | wc -l 18 [root@t60 linux-2.6-x86]# git-show HEAD:arch/x86/kernel/smpboot.c \ | grep -e "^#ifdef \+CONFIG_X86_\(32\|64\)" | wc -l 8 The remaining ones are _mainly_ (not all) due to the fact that x86_64 uses the pda, while i386 goes with normal per-cpu data. They are things like: per_cpu(current_task, cpu) = c_idle.idle; vs cpu_pda(cpu)->pcurrent = c_idle.idle; Also, there is the low mappings piece of code Hugh detected. To that, I intend to also find a common base between them. Just I'm not doing it in this series, because it deserves special attention. As usual, this series was compiled tested in a whole bunch of different configs ( ~ 10 for each architecture), including all i386 variants. Boot tested in all my hardware. The final diffstat is: arch/x86/kernel/head_32.S | 2 arch/x86/kernel/head_64.S | 48 +--------------- arch/x86/kernel/io_apic_32.c | 2 arch/x86/kernel/setup64.c | 1 arch/x86/kernel/smpboot.c | 97 ++++------------------------------ b/arch/x86/kernel/acpi/sleep.c | 2 b/arch/x86/kernel/apic_32.c | 6 -- b/arch/x86/kernel/apic_64.c | 10 +++ b/arch/x86/kernel/head_32.S | 6 +- b/arch/x86/kernel/head_64.S | 5 + b/arch/x86/kernel/io_apic_32.c | 5 + b/arch/x86/kernel/io_apic_64.c | 9 ++- b/arch/x86/kernel/process_32.c | 16 +++++ b/arch/x86/kernel/setup64.c | 5 - b/arch/x86/kernel/setup_64.c | 27 +++++++++ b/arch/x86/kernel/smpboot.c | 8 -- b/arch/x86/kernel/traps_32.c | 3 - b/arch/x86/kernel/x8664_ksyms_64.c | 5 - b/arch/x86/mach-voyager/voyager_smp.c | 2 b/include/asm-x86/desc.h | 24 +++----- b/include/asm-x86/hw_irq.h | 3 - b/include/asm-x86/numa_32.h | 1 b/include/asm-x86/segment.h | 24 ++++---- b/include/asm-x86/smp.h | 2 include/asm-x86/hw_irq.h | 4 - 25 files changed, 122 insertions(+), 195 deletions(-) ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 01/15] x86: use stack_start in x86_64 2008-06-09 14:16 [PATCH 0/15] Improve x86 smpboot integration Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 02/15] x86: don't use gdt_page openly Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh call x86_64's init_rsp stack_start, just as i386 does. Put a zeroed stack segment for consistency. With this, we can eliminate one ugly ifdef in smpboot.c. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/acpi/sleep.c | 2 +- arch/x86/kernel/head_64.S | 5 +++-- arch/x86/kernel/smpboot.c | 7 +------ 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index afc25ee..0103af2 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -72,7 +72,7 @@ int acpi_save_state_mem(void) saved_magic = 0x12345678; #else /* CONFIG_64BIT */ header->trampoline_segment = setup_trampoline() >> 4; - init_rsp = (unsigned long)temp_stack + 4096; + stack_start.sp = temp_stack + 4096; initial_code = (unsigned long)wakeup_long64; saved_magic = 0x123456789abcdef0; #endif /* CONFIG_64BIT */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d8ed325..4949c32 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -191,7 +191,7 @@ ENTRY(secondary_startup_64) movq %rax, %cr0 /* Setup a boot time stack */ - movq init_rsp(%rip),%rsp + movq stack_start(%rip),%rsp /* zero EFLAGS after setting rsp */ pushq $0 @@ -252,8 +252,9 @@ ENTRY(secondary_startup_64) .quad x86_64_start_kernel __FINITDATA - ENTRY(init_rsp) + ENTRY(stack_start) .quad init_thread_union+THREAD_SIZE-8 + .word 0 bad_address: jmp bad_address diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 722b24b..c77a7d2 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -714,11 +714,7 @@ wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip) * target processor state. */ startup_ipi_hook(phys_apicid, (unsigned long) start_secondary, -#ifdef CONFIG_X86_64 - (unsigned long)init_rsp); -#else (unsigned long)stack_start.sp); -#endif /* * Run STARTUP IPI loop. @@ -905,15 +901,14 @@ do_rest: early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu); c_idle.idle->thread.ip = (unsigned long) start_secondary; /* Stack for startup_32 can be just as for start_secondary onwards */ - stack_start.sp = (void *) c_idle.idle->thread.sp; irq_ctx_init(cpu); #else cpu_pda(cpu)->pcurrent = c_idle.idle; - init_rsp = c_idle.idle->thread.sp; load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread); initial_code = (unsigned long)start_secondary; clear_tsk_thread_flag(c_idle.idle, TIF_FORK); #endif + stack_start.sp = (void *) c_idle.idle->thread.sp; /* start_ip had better be page-aligned! */ start_ip = setup_trampoline(); -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 02/15] x86: don't use gdt_page openly. 2008-06-09 14:16 ` [PATCH 01/15] x86: use stack_start in x86_64 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 03/15] x86: remove early_gdt_descr reference Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh There's a macro available for that. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/traps_32.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c index a92b0c7..9ad896d 100644 --- a/arch/x86/kernel/traps_32.c +++ b/arch/x86/kernel/traps_32.c @@ -1135,7 +1135,7 @@ void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code) unsigned long patch_espfix_desc(unsigned long uesp, unsigned long kesp) { - struct desc_struct *gdt = __get_cpu_var(gdt_page).gdt; + struct desc_struct *gdt = get_cpu_gdt_table(smp_processor_id()); unsigned long base = (kesp - uesp) & -THREAD_SIZE; unsigned long new_kesp = kesp - base; unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT; -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 14:16 ` [PATCH 02/15] x86: don't use gdt_page openly Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 04/15] x86: move x86_64 gdt closer to i386 Glauber Costa 2008-06-09 15:23 ` [PATCH 03/15] x86: remove early_gdt_descr reference James Bottomley 0 siblings, 2 replies; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh, James Bottomley since we use switch_to_new_gdt, there is no point in assigning early_gdt_descr except for the first assignment, which is done manually. Signed-off-by: Glauber Costa <gcosta@redhat.com> CC: James Bottomley <James.Bottomley@HansenPartnership.com> --- arch/x86/kernel/smpboot.c | 1 - arch/x86/mach-voyager/voyager_smp.c | 1 - 2 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c77a7d2..181b109 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -898,7 +898,6 @@ do_rest: #ifdef CONFIG_X86_32 per_cpu(current_task, cpu) = c_idle.idle; init_gdt(cpu); - early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu); c_idle.idle->thread.ip = (unsigned long) start_secondary; /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c index 8dedd01..6b97d9b 100644 --- a/arch/x86/mach-voyager/voyager_smp.c +++ b/arch/x86/mach-voyager/voyager_smp.c @@ -529,7 +529,6 @@ static void __init do_boot_cpu(__u8 cpu) init_gdt(cpu); per_cpu(current_task, cpu) = idle; - early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu); irq_ctx_init(cpu); /* Note: Don't modify initial ss override */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 04/15] x86: move x86_64 gdt closer to i386 2008-06-09 14:16 ` [PATCH 03/15] x86: remove early_gdt_descr reference Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 05/15] x86: use initial_code for i386 Glauber Costa 2008-06-09 15:23 ` [PATCH 03/15] x86: remove early_gdt_descr reference James Bottomley 1 sibling, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh i386 and x86_64 used two different schemes for maintaining the gdt. With this patch, x86_64 initial gdt table is defined in a .c file, same way as i386 is now. Also, we call it "gdt_page", and the descriptor, "early_gdt_descr". This way we achieve common naming, which can allow for more code integration. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/head_64.S | 48 ++++---------------------------------- arch/x86/kernel/setup64.c | 5 +--- arch/x86/kernel/setup_64.c | 27 +++++++++++++++++++++ arch/x86/kernel/smpboot.c | 10 +------ arch/x86/kernel/x8664_ksyms_64.c | 5 ---- include/asm-x86/desc.h | 24 ++++++++----------- include/asm-x86/segment.h | 23 +++++++++-------- 7 files changed, 57 insertions(+), 85 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 4949c32..febeabb 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -203,7 +203,7 @@ ENTRY(secondary_startup_64) * addresses where we're currently running on. We have to do that here * because in 32bit we couldn't load a 64bit linear address. */ - lgdt cpu_gdt_descr(%rip) + lgdt early_gdt_descr(%rip) /* set up data segments. actually 0 would do too */ movl $__KERNEL_DS,%eax @@ -391,54 +391,16 @@ NEXT_PAGE(level2_spare_pgt) .data .align 16 - .globl cpu_gdt_descr -cpu_gdt_descr: - .word gdt_end-cpu_gdt_table-1 -gdt: - .quad cpu_gdt_table -#ifdef CONFIG_SMP - .rept NR_CPUS-1 - .word 0 - .quad 0 - .endr -#endif + .globl early_gdt_descr +early_gdt_descr: + .word GDT_ENTRIES*8-1 + .quad per_cpu__gdt_page ENTRY(phys_base) /* This must match the first entry in level2_kernel_pgt */ .quad 0x0000000000000000 -/* We need valid kernel segments for data and code in long mode too - * IRET will check the segment types kkeil 2000/10/28 - * Also sysret mandates a special GDT layout - */ - - .section .data.page_aligned, "aw" - .align PAGE_SIZE - -/* The TLS descriptors are currently at a different place compared to i386. - Hopefully nobody expects them at a fixed place (Wine?) */ -ENTRY(cpu_gdt_table) - .quad 0x0000000000000000 /* NULL descriptor */ - .quad 0x00cf9b000000ffff /* __KERNEL32_CS */ - .quad 0x00af9b000000ffff /* __KERNEL_CS */ - .quad 0x00cf93000000ffff /* __KERNEL_DS */ - .quad 0x00cffb000000ffff /* __USER32_CS */ - .quad 0x00cff3000000ffff /* __USER_DS, __USER32_DS */ - .quad 0x00affb000000ffff /* __USER_CS */ - .quad 0x0 /* unused */ - .quad 0,0 /* TSS */ - .quad 0,0 /* LDT */ - .quad 0,0,0 /* three TLS descriptors */ - .quad 0x0000f40000000000 /* node/CPU stored in limit */ -gdt_end: - /* asm/segment.h:GDT_ENTRIES must match this */ - /* This should be a multiple of the cache line size */ - /* GDTs of other CPUs are now dynamically allocated */ - - /* zero the remaining page */ - .fill PAGE_SIZE / 8 - GDT_ENTRIES,8,0 - .section .bss, "aw", @nobits .align L1_CACHE_BYTES ENTRY(idt_table) diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c index fc1a56d..70ff071 100644 --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -202,11 +202,8 @@ void __cpuinit cpu_init (void) * Initialize the per-CPU GDT with the boot GDT, * and set up the GDT descriptor: */ - if (cpu) - memcpy(get_cpu_gdt_table(cpu), cpu_gdt_table, GDT_SIZE); - cpu_gdt_descr[cpu].size = GDT_SIZE; - load_gdt((const struct desc_ptr *)&cpu_gdt_descr[cpu]); + switch_to_new_gdt(); load_idt((const struct desc_ptr *)&idt_descr); memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8); diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c index 2aee441..c57223f 100644 --- a/arch/x86/kernel/setup_64.c +++ b/arch/x86/kernel/setup_64.c @@ -83,6 +83,22 @@ #define ARCH_SETUP #endif +/* We need valid kernel segments for data and code in long mode too + * IRET will check the segment types kkeil 2000/10/28 + * Also sysret mandates a special GDT layout + */ +/* The TLS descriptors are currently at a different place compared to i386. + Hopefully nobody expects them at a fixed place (Wine?) */ +DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = { + [GDT_ENTRY_KERNEL32_CS] = { { { 0x0000ffff, 0x00cf9b00 } } }, + [GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00af9b00 } } }, + [GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9300 } } }, + [GDT_ENTRY_DEFAULT_USER32_CS] = { { { 0x0000ffff, 0x00cffb00 } } }, + [GDT_ENTRY_DEFAULT_USER_DS] = { { { 0x0000ffff, 0x00cff300 } } }, + [GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00affb00 } } }, +} }; +EXPORT_PER_CPU_SYMBOL_GPL(gdt_page); + /* * Machine setup.. */ @@ -273,6 +289,17 @@ void __attribute__((weak)) __init memory_setup(void) machine_specific_memory_setup(); } +/* Current gdt points %fs at the "master" per-cpu area: after this, + * it's on the real one. */ +void switch_to_new_gdt(void) +{ + struct desc_ptr gdt_descr; + + gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id()); + gdt_descr.size = GDT_SIZE - 1; + load_gdt(&gdt_descr); +} + /* * setup_arch - architecture-specific boot-time initializations * diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 181b109..69639fd 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -849,14 +849,8 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu) .done = COMPLETION_INITIALIZER_ONSTACK(c_idle.done), }; INIT_WORK(&c_idle.work, do_fork_idle); -#ifdef CONFIG_X86_64 - /* allocate memory for gdts of secondary cpus. Hotplug is considered */ - if (!cpu_gdt_descr[cpu].address && - !(cpu_gdt_descr[cpu].address = get_zeroed_page(GFP_KERNEL))) { - printk(KERN_ERR "Failed to allocate GDT for CPU %d\n", cpu); - return -1; - } +#ifdef CONFIG_X86_64 /* Allocate node local memory for AP pdas */ if (cpu > 0) { boot_error = get_local_pda(cpu); @@ -1253,8 +1247,8 @@ void __init native_smp_prepare_boot_cpu(void) int me = smp_processor_id(); #ifdef CONFIG_X86_32 init_gdt(me); - switch_to_new_gdt(); #endif + switch_to_new_gdt(); /* already set me in cpu_online_map in boot_cpu_init() */ cpu_set(me, cpu_callout_map); per_cpu(cpu_state, me) = CPU_ONLINE; diff --git a/arch/x86/kernel/x8664_ksyms_64.c b/arch/x86/kernel/x8664_ksyms_64.c index 122885b..16a6751 100644 --- a/arch/x86/kernel/x8664_ksyms_64.c +++ b/arch/x86/kernel/x8664_ksyms_64.c @@ -60,8 +60,3 @@ EXPORT_SYMBOL(init_level4_pgt); EXPORT_SYMBOL(load_gs_index); EXPORT_SYMBOL(_proxy_pda); - -#ifdef CONFIG_PARAVIRT -/* Virtualized guests may want to use it */ -EXPORT_SYMBOL_GPL(cpu_gdt_descr); -#endif diff --git a/include/asm-x86/desc.h b/include/asm-x86/desc.h index b3875d4..07f9f2b 100644 --- a/include/asm-x86/desc.h +++ b/include/asm-x86/desc.h @@ -29,11 +29,17 @@ static inline void fill_ldt(struct desc_struct *desc, extern struct desc_ptr idt_descr; extern gate_desc idt_table[]; +struct gdt_page { + struct desc_struct gdt[GDT_ENTRIES]; +} __attribute__((aligned(PAGE_SIZE))); +DECLARE_PER_CPU(struct gdt_page, gdt_page); + +static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu) +{ + return per_cpu(gdt_page, cpu).gdt; +} + #ifdef CONFIG_X86_64 -extern struct desc_struct cpu_gdt_table[GDT_ENTRIES]; -extern struct desc_ptr cpu_gdt_descr[]; -/* the cpu gdt accessor */ -#define get_cpu_gdt_table(x) ((struct desc_struct *)cpu_gdt_descr[x].address) static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func, unsigned dpl, unsigned ist, unsigned seg) @@ -51,16 +57,6 @@ static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func, } #else -struct gdt_page { - struct desc_struct gdt[GDT_ENTRIES]; -} __attribute__((aligned(PAGE_SIZE))); -DECLARE_PER_CPU(struct gdt_page, gdt_page); - -static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu) -{ - return per_cpu(gdt_page, cpu).gdt; -} - static inline void pack_gate(gate_desc *gate, unsigned char type, unsigned long base, unsigned dpl, unsigned flags, unsigned short seg) diff --git a/include/asm-x86/segment.h b/include/asm-x86/segment.h index ed5131d..dfc8601 100644 --- a/include/asm-x86/segment.h +++ b/include/asm-x86/segment.h @@ -61,18 +61,14 @@ #define GDT_ENTRY_TLS_MAX (GDT_ENTRY_TLS_MIN + GDT_ENTRY_TLS_ENTRIES - 1) #define GDT_ENTRY_DEFAULT_USER_CS 14 -#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS * 8 + 3) #define GDT_ENTRY_DEFAULT_USER_DS 15 -#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS * 8 + 3) #define GDT_ENTRY_KERNEL_BASE 12 #define GDT_ENTRY_KERNEL_CS (GDT_ENTRY_KERNEL_BASE + 0) -#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8) #define GDT_ENTRY_KERNEL_DS (GDT_ENTRY_KERNEL_BASE + 1) -#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8) #define GDT_ENTRY_TSS (GDT_ENTRY_KERNEL_BASE + 4) #define GDT_ENTRY_LDT (GDT_ENTRY_KERNEL_BASE + 5) @@ -139,10 +135,11 @@ #else #include <asm/cache.h> -#define __KERNEL_CS 0x10 -#define __KERNEL_DS 0x18 +#define GDT_ENTRY_KERNEL32_CS 1 +#define GDT_ENTRY_KERNEL_CS 2 +#define GDT_ENTRY_KERNEL_DS 3 -#define __KERNEL32_CS 0x08 +#define __KERNEL32_CS (GDT_ENTRY_KERNEL32_CS * 8) /* * we cannot use the same code segment descriptor for user and kernel @@ -150,10 +147,10 @@ * The segment offset needs to contain a RPL. Grr. -AK * GDT layout to get 64bit syscall right (sysret hardcodes gdt offsets) */ - -#define __USER32_CS 0x23 /* 4*8+3 */ -#define __USER_DS 0x2b /* 5*8+3 */ -#define __USER_CS 0x33 /* 6*8+3 */ +#define GDT_ENTRY_DEFAULT_USER32_CS 4 +#define GDT_ENTRY_DEFAULT_USER_DS 5 +#define GDT_ENTRY_DEFAULT_USER_CS 6 +#define __USER32_CS (GDT_ENTRY_DEFAULT_USER32_CS * 8 + 3) #define __USER32_DS __USER_DS #define GDT_ENTRY_TSS 8 /* needs two entries */ @@ -175,6 +172,10 @@ #endif +#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8) +#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8) +#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS* 8 + 3) +#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS* 8 + 3) #ifndef CONFIG_PARAVIRT #define get_kernel_rpl() 0 #endif -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 05/15] x86: use initial_code for i386 2008-06-09 14:16 ` [PATCH 04/15] x86: move x86_64 gdt closer to i386 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 06/15] x86: boot secondary cpus through initial_code Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh x86_64 jumps to whatever is written in "initial_code" symbol, instead of a fixed address. Do it for i386 too. It will allow us to integrate more of the smp boot code. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/head_32.S | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index b98b338..ffb73a5 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -455,7 +455,10 @@ is386: movl $2,%ecx # set MP jmp initialize_secondary # all other CPUs call initialize_secondary 1: #endif /* CONFIG_SMP */ - jmp i386_start_kernel + jmp *(initial_code) +.align 4 +ENTRY(initial_code) + .long i386_start_kernel /* * We depend on ET to be correct. This checks for 287/387. -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 06/15] x86: boot secondary cpus through initial_code 2008-06-09 14:16 ` [PATCH 05/15] x86: use initial_code for i386 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 07/15] x86: clearing io_apic harmless for x86_64 Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh remove "initialize_secondary". Boot both architectures via initial_code. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/head_32.S | 2 +- arch/x86/kernel/smpboot.c | 25 +------------------------ 2 files changed, 2 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index ffb73a5..f67e934 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -452,7 +452,7 @@ is386: movl $2,%ecx # set MP je 1f movl $(__KERNEL_PERCPU), %eax movl %eax,%fs # set this cpu's percpu - jmp initialize_secondary # all other CPUs call initialize_secondary + movl (stack_start), %esp 1: #endif /* CONFIG_SMP */ jmp *(initial_code) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 69639fd..0e4a44a 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -349,28 +349,6 @@ static void __cpuinit start_secondary(void *unused) cpu_idle(); } -#ifdef CONFIG_X86_32 -/* - * Everything has been set up for the secondary - * CPUs - they just need to reload everything - * from the task structure - * This function must not return. - */ -void __devinit initialize_secondary(void) -{ - /* - * We don't actually need to load the full TSS, - * basically just the stack pointer and the ip. - */ - - asm volatile( - "movl %0,%%esp\n\t" - "jmp *%1" - : - :"m" (current->thread.sp), "m" (current->thread.ip)); -} -#endif - static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_32 @@ -892,15 +870,14 @@ do_rest: #ifdef CONFIG_X86_32 per_cpu(current_task, cpu) = c_idle.idle; init_gdt(cpu); - c_idle.idle->thread.ip = (unsigned long) start_secondary; /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); #else cpu_pda(cpu)->pcurrent = c_idle.idle; load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread); - initial_code = (unsigned long)start_secondary; clear_tsk_thread_flag(c_idle.idle, TIF_FORK); #endif + initial_code = (unsigned long)start_secondary; stack_start.sp = (void *) c_idle.idle->thread.sp; /* start_ip had better be page-aligned! */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 07/15] x86: clearing io_apic harmless for x86_64 2008-06-09 14:16 ` [PATCH 06/15] x86: boot secondary cpus through initial_code Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 08/15] x86: remove ifdef from stepping Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh so remove ifdef. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/smpboot.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 0e4a44a..5a055a1 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1052,9 +1052,8 @@ static __init void disable_smp(void) { cpu_present_map = cpumask_of_cpu(0); cpu_possible_map = cpumask_of_cpu(0); -#ifdef CONFIG_X86_32 smpboot_clear_io_apic_irqs(); -#endif + if (smp_found_config) phys_cpu_present_map = physid_mask_of_physid(boot_cpu_physical_apicid); -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 08/15] x86: remove ifdef from stepping 2008-06-09 14:16 ` [PATCH 07/15] x86: clearing io_apic harmless for x86_64 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 09/15] x86: change __setup_vector_irq with setup_vector_irq Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh The stepping won't affect x86_64, since there are not x86_64 k7's or pentiums. So, although it adds to the binary size, remove the ifdef for smoother integration Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/smpboot.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 5a055a1..50d307e 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -351,7 +351,6 @@ static void __cpuinit start_secondary(void *unused) static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_32 /* * Mask B, Pentium, but not Pentium MMX */ @@ -401,7 +400,6 @@ static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c) valid_k7: ; -#endif } static void __cpuinit smp_checks(void) -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 09/15] x86: change __setup_vector_irq with setup_vector_irq 2008-06-09 14:16 ` [PATCH 08/15] x86: remove ifdef from stepping Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 10/15] x86: provide connect_bsp_APIC for x86_64 Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh We create a version of it for i386, and then take the CONFIG_X86_64 ifdef out of the game. We could create a __setup_vector_irq for i386, but it would incur in an unnecessary lock taking. Moreover, it is better practice to only export setup_vector_irq anyway. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/io_apic_32.c | 5 +++++ arch/x86/kernel/io_apic_64.c | 9 ++++++++- arch/x86/kernel/smpboot.c | 11 ++--------- include/asm-x86/hw_irq.h | 2 +- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c index 9dbe7e9..7fc071f 100644 --- a/arch/x86/kernel/io_apic_32.c +++ b/arch/x86/kernel/io_apic_32.c @@ -1209,6 +1209,11 @@ static int assign_irq_vector(int irq) return vector; } + +void setup_vector_irq(int cpu) +{ +} + static struct irq_chip ioapic_chip; #define IOAPIC_AUTO -1 diff --git a/arch/x86/kernel/io_apic_64.c b/arch/x86/kernel/io_apic_64.c index 5ea376d..97d67c9 100644 --- a/arch/x86/kernel/io_apic_64.c +++ b/arch/x86/kernel/io_apic_64.c @@ -780,7 +780,7 @@ static void __clear_irq_vector(int irq) cpus_clear(cfg->domain); } -void __setup_vector_irq(int cpu) +static void __setup_vector_irq(int cpu) { /* Initialize vector_irq on a new cpu */ /* This function must be called with vector_lock held */ @@ -803,6 +803,13 @@ void __setup_vector_irq(int cpu) } } +void setup_vector_irq(int cpu) +{ + spin_lock(&vector_lock); + __setup_vector_irq(smp_processor_id()); + spin_unlock(&vector_lock); +} + static struct irq_chip ioapic_chip; diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 50d307e..9d8dfec 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -329,15 +329,8 @@ static void __cpuinit start_secondary(void *unused) * smp_call_function(). */ lock_ipi_call_lock(); -#ifdef CONFIG_X86_64 - spin_lock(&vector_lock); - - /* Setup the per cpu irq handling data structures */ - __setup_vector_irq(smp_processor_id()); - /* - * Allow the master to continue. - */ - spin_unlock(&vector_lock); +#ifdef CONFIG_X86_IO_APIC + setup_vector_irq(smp_processor_id()); #endif cpu_set(smp_processor_id(), cpu_online_map); unlock_ipi_call_lock(); diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h index 1428b41..18f067c 100644 --- a/include/asm-x86/hw_irq.h +++ b/include/asm-x86/hw_irq.h @@ -97,9 +97,9 @@ extern void (*const interrupt[NR_IRQS])(void); #else typedef int vector_irq_t[NR_VECTORS]; DECLARE_PER_CPU(vector_irq_t, vector_irq); -extern void __setup_vector_irq(int cpu); extern spinlock_t vector_lock; #endif +extern void setup_vector_irq(int cpu); #endif /* !ASSEMBLY_ */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 10/15] x86: provide connect_bsp_APIC for x86_64 2008-06-09 14:16 ` [PATCH 09/15] x86: change __setup_vector_irq with setup_vector_irq Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh Although it is not really needed, we provide it to get closer to i386. ifdefs around it are removed in smpboot.c Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/apic_64.c | 10 ++++++++++ arch/x86/kernel/smpboot.c | 5 +---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c index 310bec1..8d62387 100644 --- a/arch/x86/kernel/apic_64.c +++ b/arch/x86/kernel/apic_64.c @@ -942,6 +942,8 @@ int __init APIC_init_uniprocessor(void) verify_local_APIC(); + connect_bsp_APIC(); + phys_cpu_present_map = physid_mask_of_physid(boot_cpu_physical_apicid); apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid)); @@ -1023,6 +1025,14 @@ asmlinkage void smp_error_interrupt(void) irq_exit(); } +/** + * * connect_bsp_APIC - attach the APIC to the interrupt system + * */ +void __init connect_bsp_APIC(void) +{ + enable_apic_mode(); +} + void disconnect_bsp_APIC(int virt_wire_setup) { /* Go back to Virtual Wire compatibility mode */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 9d8dfec..c4bb997 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1117,9 +1117,7 @@ static int __init smp_sanity_check(unsigned max_cpus) localise_nmi_watchdog(); -#ifdef CONFIG_X86_32 connect_bsp_APIC(); -#endif setup_local_APIC(); end_local_APIC_setup(); return -1; @@ -1174,9 +1172,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) } preempt_enable(); -#ifdef CONFIG_X86_32 connect_bsp_APIC(); -#endif + /* * Switch from PIC to APIC mode. */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 14:16 ` [PATCH 10/15] x86: provide connect_bsp_APIC for x86_64 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 12/15] x86: change naming to match x86_64 Glauber Costa 2008-06-09 15:23 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Maciej W. Rozycki 0 siblings, 2 replies; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh Do it, instead of keeping in io_apic_32.c. This is the way x86_64 already does. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/apic_32.c | 6 +----- arch/x86/kernel/io_apic_32.c | 2 +- arch/x86/kernel/smpboot.c | 3 ++- include/asm-x86/hw_irq.h | 3 --- 4 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c index fa8cf79..a06442a 100644 --- a/arch/x86/kernel/apic_32.c +++ b/arch/x86/kernel/apic_32.c @@ -1277,11 +1277,7 @@ int __init APIC_init_uniprocessor(void) #endif localise_nmi_watchdog(); end_local_APIC_setup(); -#ifdef CONFIG_X86_IO_APIC - if (smp_found_config) - if (!skip_ioapic_setup && nr_ioapics) - setup_IO_APIC(); -#endif + setup_boot_clock(); return 0; diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c index 7fc071f..cb79ce3 100644 --- a/arch/x86/kernel/io_apic_32.c +++ b/arch/x86/kernel/io_apic_32.c @@ -1606,7 +1606,7 @@ void /*__init*/ print_PIC(void) #endif /* 0 */ -static void __init enable_IO_APIC(void) +void __init enable_IO_APIC(void) { union IO_APIC_reg_01 reg_01; int i8259_apic, i8259_pin; diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c4bb997..49d2900 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1179,13 +1179,14 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) */ setup_local_APIC(); -#ifdef CONFIG_X86_64 +#ifdef CONFIG_X86_IO_APIC /* * Enable IO APIC before setting up error vector */ if (!skip_ioapic_setup && nr_ioapics) enable_IO_APIC(); #endif + end_local_APIC_setup(); map_cpu_to_logical_apicid(); diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h index 18f067c..31b56cd 100644 --- a/include/asm-x86/hw_irq.h +++ b/include/asm-x86/hw_irq.h @@ -66,10 +66,7 @@ extern void disable_IO_APIC(void); extern void print_IO_APIC(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn); extern void setup_ioapic_dest(void); - -#ifdef CONFIG_X86_64 extern void enable_IO_APIC(void); -#endif /* IPI functions */ extern void send_IPI_self(int vector); -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 12/15] x86: change naming to match x86_64 2008-06-09 14:16 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 13/15] x86: remove cpu from maps Glauber Costa 2008-06-09 15:23 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Maciej W. Rozycki 1 sibling, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh, Glauber de Oliveira Costa Change unmap_cpu_to_logical_apicid to numa_remove_cpu. Besides being shorter, it is the same name x86_64 uses. We can save an ifdef in the code this way. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- arch/x86/kernel/smpboot.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 49d2900..6316d30 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -181,13 +181,12 @@ static void map_cpu_to_logical_apicid(void) map_cpu_to_node(cpu, node); } -static void unmap_cpu_to_logical_apicid(int cpu) +static void numa_remove_cpu(int cpu) { cpu_2_logical_apicid[cpu] = BAD_APICID; unmap_cpu_to_node(cpu); } #else -#define unmap_cpu_to_logical_apicid(cpu) do {} while (0) #define map_cpu_to_logical_apicid() do {} while (0) #endif @@ -945,10 +944,7 @@ restore_state: if (boot_error) { /* Try to put things back the way they were before ... */ - unmap_cpu_to_logical_apicid(cpu); -#ifdef CONFIG_X86_64 numa_remove_cpu(cpu); /* was set by numa_add_cpu */ -#endif cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */ cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */ cpu_clear(cpu, cpu_possible_map); @@ -1246,7 +1242,7 @@ void cpu_exit_clear(void) cpu_clear(cpu, cpu_callout_map); cpu_clear(cpu, cpu_callin_map); - unmap_cpu_to_logical_apicid(cpu); + numa_remove_cpu(cpu); } # endif /* CONFIG_X86_32 */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 13/15] x86: remove cpu from maps 2008-06-09 14:16 ` [PATCH 12/15] x86: change naming to match x86_64 Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 14/15] x86: move cpu_exit_clear to process_32.c Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh during cpu disable, take cpus out of all maps in i386, instead of just the online map. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/smpboot.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 6316d30..f6b99b6 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1321,13 +1321,11 @@ __init void prefill_possible_map(void) static void __ref remove_cpu_from_maps(int cpu) { cpu_clear(cpu, cpu_online_map); -#ifdef CONFIG_X86_64 cpu_clear(cpu, cpu_callout_map); cpu_clear(cpu, cpu_callin_map); /* was set by cpu_init() */ clear_bit(cpu, (unsigned long *)&cpu_initialized); numa_remove_cpu(cpu); -#endif } int __cpu_disable(void) -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 14/15] x86: move cpu_exit_clear to process_32.c 2008-06-09 14:16 ` [PATCH 13/15] x86: remove cpu from maps Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 2008-06-09 14:16 ` [PATCH 15/15] x86: take load_sp0 out of smpboot.c Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh Take it out of smpboot.c, and move it to process_32.c, closer to its only user. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/process_32.c | 16 ++++++++++++++++ arch/x86/kernel/smpboot.c | 19 +------------------ include/asm-x86/numa_32.h | 1 + include/asm-x86/smp.h | 1 - 4 files changed, 18 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 1aaca76..99f8e7d 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -128,6 +128,22 @@ EXPORT_SYMBOL(default_idle); #ifdef CONFIG_HOTPLUG_CPU #include <asm/nmi.h> + +static void cpu_exit_clear(void) +{ + int cpu = raw_smp_processor_id(); + + idle_task_exit(); + + cpu_uninit(); + irq_ctx_exit(cpu); + + cpu_clear(cpu, cpu_callout_map); + cpu_clear(cpu, cpu_callin_map); + + numa_remove_cpu(cpu); +} + /* We don't actually take CPU down, just spin without interrupts. */ static inline void play_dead(void) { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index f6b99b6..7952c39 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -181,7 +181,7 @@ static void map_cpu_to_logical_apicid(void) map_cpu_to_node(cpu, node); } -static void numa_remove_cpu(int cpu) +void numa_remove_cpu(int cpu) { cpu_2_logical_apicid[cpu] = BAD_APICID; unmap_cpu_to_node(cpu); @@ -1229,23 +1229,6 @@ void __init native_smp_cpus_done(unsigned int max_cpus) #ifdef CONFIG_HOTPLUG_CPU -# ifdef CONFIG_X86_32 -void cpu_exit_clear(void) -{ - int cpu = raw_smp_processor_id(); - - idle_task_exit(); - - cpu_uninit(); - irq_ctx_exit(cpu); - - cpu_clear(cpu, cpu_callout_map); - cpu_clear(cpu, cpu_callin_map); - - numa_remove_cpu(cpu); -} -# endif /* CONFIG_X86_32 */ - static void remove_siblinginfo(int cpu) { int sibling; diff --git a/include/asm-x86/numa_32.h b/include/asm-x86/numa_32.h index 03d0f7a..205015f 100644 --- a/include/asm-x86/numa_32.h +++ b/include/asm-x86/numa_32.h @@ -2,6 +2,7 @@ #define _ASM_X86_32_NUMA_H 1 extern int pxm_to_nid(int pxm); +extern void numa_remove_cpu(int cpu); #ifdef CONFIG_NUMA extern void __init remap_numa_kva(void); diff --git a/include/asm-x86/smp.h b/include/asm-x86/smp.h index fc10073..fad45f6 100644 --- a/include/asm-x86/smp.h +++ b/include/asm-x86/smp.h @@ -188,7 +188,6 @@ static inline int hard_smp_processor_id(void) #endif /* CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_HOTPLUG_CPU -extern void cpu_exit_clear(void); extern void cpu_uninit(void); #endif -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 15/15] x86: take load_sp0 out of smpboot.c 2008-06-09 14:16 ` [PATCH 14/15] x86: move cpu_exit_clear to process_32.c Glauber Costa @ 2008-06-09 14:16 ` Glauber Costa 0 siblings, 0 replies; 38+ messages in thread From: Glauber Costa @ 2008-06-09 14:16 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, tglx, mingo, hugh there's no particular reason to do load_sp0 in different places for i386 and x86_64. They should all be in cpu_init. Right now, cpu_init itself is not integrated, but with this patch, the code becomes closer to each other, making in easier to integrate when the time comes. Furthermore, although doing it in do_boot_cpu for x86_64 is fine, since it's only a copy, load_sp0 should be executed in the cpu it refers to anyway. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/setup64.c | 1 + arch/x86/kernel/smpboot.c | 1 - 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c index 70ff071..151d315 100644 --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -247,6 +247,7 @@ void __cpuinit cpu_init (void) BUG(); enter_lazy_tlb(&init_mm, me); + load_sp0(t, ¤t->thread); set_tss_desc(cpu, t); load_TR_desc(); load_LDT(&init_mm.context); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 7952c39..371113a 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -864,7 +864,6 @@ do_rest: irq_ctx_init(cpu); #else cpu_pda(cpu)->pcurrent = c_idle.idle; - load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread); clear_tsk_thread_flag(c_idle.idle, TIF_FORK); #endif initial_code = (unsigned long)start_secondary; -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 14:16 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Glauber Costa 2008-06-09 14:16 ` [PATCH 12/15] x86: change naming to match x86_64 Glauber Costa @ 2008-06-09 15:23 ` Maciej W. Rozycki 2008-06-09 15:52 ` Glauber Costa 2008-06-09 19:44 ` Glauber Costa 1 sibling, 2 replies; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-09 15:23 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, akpm, tglx, mingo, hugh On Mon, 9 Jun 2008, Glauber Costa wrote: > Do it, instead of keeping in io_apic_32.c. This is the way x86_64 > already does. This change looks wrong -- is native_smp_prepare_cpus() at all run on !CONFIG_SMP? I don't think so. You have to do this differently. Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 15:23 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Maciej W. Rozycki @ 2008-06-09 15:52 ` Glauber Costa 2008-06-09 19:44 ` Glauber Costa 1 sibling, 0 replies; 38+ messages in thread From: Glauber Costa @ 2008-06-09 15:52 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, akpm, tglx, mingo, hugh Maciej W. Rozycki wrote: > On Mon, 9 Jun 2008, Glauber Costa wrote: > >> Do it, instead of keeping in io_apic_32.c. This is the way x86_64 >> already does. > > This change looks wrong -- is native_smp_prepare_cpus() at all run on > !CONFIG_SMP? I don't think so. You have to do this differently. You're right. I'll update this one. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 15:23 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Maciej W. Rozycki 2008-06-09 15:52 ` Glauber Costa @ 2008-06-09 19:44 ` Glauber Costa 2008-06-09 20:12 ` Maciej W. Rozycki 1 sibling, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 19:44 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, akpm, tglx, mingo, hugh [-- Attachment #1: Type: text/plain, Size: 443 bytes --] Maciej W. Rozycki wrote: > On Mon, 9 Jun 2008, Glauber Costa wrote: > >> Do it, instead of keeping in io_apic_32.c. This is the way x86_64 >> already does. > > This change looks wrong -- is native_smp_prepare_cpus() at all run on > !CONFIG_SMP? I don't think so. You have to do this differently. > > Maciej How about this one instead? This one does enable_IO_APIC in the init_uniprocessor too, and should account for the !smp case. [-- Attachment #2: 0001-x86-move-enabling-of-io_apic-to-prepare_cpus.patch --] [-- Type: text/x-patch, Size: 2526 bytes --] >From 87f8366012e1dc3cf7212e241b0aa552d7825338 Mon Sep 17 00:00:00 2001 From: Glauber Costa <gcosta@redhat.com> Date: Wed, 28 May 2008 17:04:12 -0300 Subject: [PATCH] x86: move enabling of io_apic to prepare_cpus Do it, instead of keeping in io_apic_32.c. This is the way x86_64 already does. Signed-off-by: Glauber Costa <gcosta@redhat.com> --- arch/x86/kernel/apic_32.c | 3 +++ arch/x86/kernel/io_apic_32.c | 4 ++-- arch/x86/kernel/smpboot.c | 2 +- include/asm-x86/hw_irq.h | 2 -- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c index fa8cf79..8385dc6 100644 --- a/arch/x86/kernel/apic_32.c +++ b/arch/x86/kernel/apic_32.c @@ -1273,6 +1273,9 @@ int __init APIC_init_uniprocessor(void) setup_local_APIC(); #ifdef CONFIG_X86_IO_APIC + if (!skip_ioapic_setup && nr_ioapics) + enable_IO_APIC(); + if (!smp_found_config || skip_ioapic_setup || !nr_ioapics) #endif localise_nmi_watchdog(); diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c index 7fc071f..fa4f0a0 100644 --- a/arch/x86/kernel/io_apic_32.c +++ b/arch/x86/kernel/io_apic_32.c @@ -1606,7 +1606,7 @@ void /*__init*/ print_PIC(void) #endif /* 0 */ -static void __init enable_IO_APIC(void) +void __init enable_IO_APIC(void) { union IO_APIC_reg_01 reg_01; int i8259_apic, i8259_pin; @@ -2295,7 +2295,7 @@ void __init setup_IO_APIC(void) for (i = first_system_vector; i < NR_VECTORS; i++) set_bit(i, used_vectors); - enable_IO_APIC(); + /* calling enable_IO_APIC() is moved to setup_local_APIC for BP */ if (acpi_ioapic) io_apic_irqs = ~0; /* all IRQs go through IOAPIC */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index f5ac5b2..e537c7b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1175,7 +1175,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) */ setup_local_APIC(); -#ifdef CONFIG_X86_64 +#ifdef CONFIG_X86_IO_APIC /* * Enable IO APIC before setting up error vector */ diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h index 18f067c..4c81d1b 100644 --- a/include/asm-x86/hw_irq.h +++ b/include/asm-x86/hw_irq.h @@ -67,9 +67,7 @@ extern void print_IO_APIC(void); extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn); extern void setup_ioapic_dest(void); -#ifdef CONFIG_X86_64 extern void enable_IO_APIC(void); -#endif /* IPI functions */ extern void send_IPI_self(int vector); -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 19:44 ` Glauber Costa @ 2008-06-09 20:12 ` Maciej W. Rozycki 2008-06-09 20:53 ` Yinghai Lu 2008-06-09 21:02 ` Glauber Costa 0 siblings, 2 replies; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-09 20:12 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, akpm, tglx, mingo, hugh On Mon, 9 Jun 2008, Glauber Costa wrote: > This one does enable_IO_APIC in the init_uniprocessor too, and should > account for the !smp case. Hmm, it looks a little bit better, but why do you want to call enable_IO_APIC() separately in the first place? There is a comment stating: "Enable IO APIC before setting up error vector," but why is it needed on 64-bit systems? Especially as the very same system may run a 32-bit kernel and then it suddenly would not have to do this anymore? Strange... Also since you are cleaning up this code -- why don't you actually take the opportunity and get rid of the horrible #ifdefs interspersed throughout? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 20:12 ` Maciej W. Rozycki @ 2008-06-09 20:53 ` Yinghai Lu 2008-06-09 21:00 ` Maciej W. Rozycki 2008-06-09 21:02 ` Glauber Costa 1 sibling, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2008-06-09 20:53 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Mon, Jun 9, 2008 at 1:12 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote: > On Mon, 9 Jun 2008, Glauber Costa wrote: > >> This one does enable_IO_APIC in the init_uniprocessor too, and should >> account for the !smp case. > > Hmm, it looks a little bit better, but why do you want to call > enable_IO_APIC() separately in the first place? There is a comment > stating: "Enable IO APIC before setting up error vector," but why is it > needed on 64-bit systems? Especially as the very same system may run a > 32-bit kernel and then it suddenly would not have to do this anymore? > Strange... that enable_IO_APIC() only call record old ioapic/pin for timer and clear_IO_APIC. we need clear_IO_APIC before enable setup error vector, in case there is wrong setting in ioapic by BIOS... YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 20:53 ` Yinghai Lu @ 2008-06-09 21:00 ` Maciej W. Rozycki 2008-06-10 2:46 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-09 21:00 UTC (permalink / raw) To: Yinghai Lu; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Mon, 9 Jun 2008, Yinghai Lu wrote: > we need clear_IO_APIC before enable setup error vector, in case there > is wrong setting in ioapic by BIOS... Fair enough, although it is still interesting why it would only trigger in the 64-bit mode and why shouldn't the BIOS be fixed instead. Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 21:00 ` Maciej W. Rozycki @ 2008-06-10 2:46 ` Maciej W. Rozycki 2008-06-10 5:08 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-10 2:46 UTC (permalink / raw) To: Yinghai Lu; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Mon, 9 Jun 2008, Maciej W. Rozycki wrote: > > we need clear_IO_APIC before enable setup error vector, in case there > > is wrong setting in ioapic by BIOS... > > Fair enough, although it is still interesting why it would only trigger > in the 64-bit mode and why shouldn't the BIOS be fixed instead. Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 2:46 ` Maciej W. Rozycki @ 2008-06-10 5:08 ` Yinghai Lu 2008-06-10 13:00 ` Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2008-06-10 5:08 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Mon, Jun 9, 2008 at 7:46 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote: > On Mon, 9 Jun 2008, Maciej W. Rozycki wrote: > >> > we need clear_IO_APIC before enable setup error vector, in case there >> > is wrong setting in ioapic by BIOS... >> >> Fair enough, although it is still interesting why it would only trigger >> in the 64-bit mode and why shouldn't the BIOS be fixed instead. > > Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? config X86_LOCAL_APIC def_bool y depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER) || X86_GENERICARCH)) config X86_IO_APIC def_bool y depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH)) for 64bit, those are all set. for 32bit, may need to null stub if X86_IO_APIC is not set YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 5:08 ` Yinghai Lu @ 2008-06-10 13:00 ` Glauber Costa 2008-06-10 13:30 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-10 13:00 UTC (permalink / raw) To: Yinghai Lu; +Cc: Maciej W. Rozycki, linux-kernel, akpm, tglx, mingo, hugh Yinghai Lu wrote: > On Mon, Jun 9, 2008 at 7:46 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote: >> On Mon, 9 Jun 2008, Maciej W. Rozycki wrote: >> >>>> we need clear_IO_APIC before enable setup error vector, in case there >>>> is wrong setting in ioapic by BIOS... >>> Fair enough, although it is still interesting why it would only trigger >>> in the 64-bit mode and why shouldn't the BIOS be fixed instead. >> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? > > config X86_LOCAL_APIC > def_bool y > depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS || > SMP) && !X86_VOYAGER) || X86_GENERICARCH)) > > config X86_IO_APIC > def_bool y > depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && > !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH)) > > for 64bit, those are all set. > > for 32bit, may need to null stub if X86_IO_APIC is not set > > YH Fair Enough. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 13:00 ` Glauber Costa @ 2008-06-10 13:30 ` Maciej W. Rozycki 2008-06-10 19:09 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-10 13:30 UTC (permalink / raw) To: Glauber Costa; +Cc: Yinghai Lu, linux-kernel, akpm, tglx, mingo, hugh On Tue, 10 Jun 2008, Glauber Costa wrote: > >> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? > > > > config X86_LOCAL_APIC > > def_bool y > > depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS || > > SMP) && !X86_VOYAGER) || X86_GENERICARCH)) > > > > config X86_IO_APIC > > def_bool y > > depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && > > !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH)) > > > > for 64bit, those are all set. > > > > for 32bit, may need to null stub if X86_IO_APIC is not set > > > > YH > Fair Enough. That does not answer the question what to do with a 32-bit kernel run on a system that requires the fixup in the 64-bit mode. Papering over such firmware bugs in the kernel encourages hardware vendors to maintain the breakage. Note that the I/O APIC comes out of reset with all the redirection entries masked out, so if any error conditions are triggered before Linux configures the chip, they are a result of a deliberate misprogramming done to the chip by the firmware. Does anyone have a reference to the original problem report which lead to this workaround having been put in place? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 13:30 ` Maciej W. Rozycki @ 2008-06-10 19:09 ` Yinghai Lu 2008-06-10 19:36 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2008-06-10 19:09 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, Jun 10, 2008 at 6:30 AM, Maciej W. Rozycki <macro@linux-mips.org> wrote: > On Tue, 10 Jun 2008, Glauber Costa wrote: > >> >> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not? >> > >> > config X86_LOCAL_APIC >> > def_bool y >> > depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS || >> > SMP) && !X86_VOYAGER) || X86_GENERICARCH)) >> > >> > config X86_IO_APIC >> > def_bool y >> > depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && >> > !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH)) >> > >> > for 64bit, those are all set. >> > >> > for 32bit, may need to null stub if X86_IO_APIC is not set >> > >> > YH >> Fair Enough. > > That does not answer the question what to do with a 32-bit kernel run on > a system that requires the fixup in the 64-bit mode. Papering over such > firmware bugs in the kernel encourages hardware vendors to maintain the > breakage. kernel should not assume io apic register is set right by firmware. and kernel actually doesn't trust them, and clear the io apic registers. that patch just move that early before enable error vector. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 19:09 ` Yinghai Lu @ 2008-06-10 19:36 ` Maciej W. Rozycki 2008-06-10 19:49 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-10 19:36 UTC (permalink / raw) To: Yinghai Lu; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, 10 Jun 2008, Yinghai Lu wrote: > kernel should not assume io apic register is set right by firmware. What is the basis of this assumption? Linux generally assumes the chipset components have been placed by the firmware into a consistent state. That does not necessarily mean suitable for Linux, hence the need to reconfigure a bit here or there, but there should be no need to touch components that are not going to be used by Linux directly. The I/O APIC is no different. > and kernel actually doesn't trust them, and clear the io apic registers. The I/O APIC registers are cleared, because this is the only way you can assure inconsistent configuration does not happen during reconfiguration. For example two inputs using the same vector or set up into the ExtINTA mode. The original intent of the code was not to paper over breakage in the firmware. You are trying to change it and it can be done, but it has to be justified well. > that patch just move that early before enable error vector. What I am saying repeatedly is clearing of the I/O APIC is not guaranteed to happen for all the possible cases of Linux configuration. Which means this is a partial solution only -- please try to propose a better one or provide the original problem report so that someone else can have a look at it. What's triggering the error interrupt for example? Is it recoverable? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 19:36 ` Maciej W. Rozycki @ 2008-06-10 19:49 ` Yinghai Lu 2008-06-11 0:29 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2008-06-10 19:49 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, Jun 10, 2008 at 12:36 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote: > On Tue, 10 Jun 2008, Yinghai Lu wrote: > >> kernel should not assume io apic register is set right by firmware. > > What is the basis of this assumption? Linux generally assumes the > chipset components have been placed by the firmware into a consistent > state. That does not necessarily mean suitable for Linux, hence the need > to reconfigure a bit here or there, but there should be no need to touch > components that are not going to be used by Linux directly. The I/O APIC > is no different. > >> and kernel actually doesn't trust them, and clear the io apic registers. > > The I/O APIC registers are cleared, because this is the only way you can > assure inconsistent configuration does not happen during reconfiguration. > For example two inputs using the same vector or set up into the ExtINTA > mode. The original intent of the code was not to paper over breakage in > the firmware. You are trying to change it and it can be done, but it has > to be justified well. > >> that patch just move that early before enable error vector. > > What I am saying repeatedly is clearing of the I/O APIC is not guaranteed > to happen for all the possible cases of Linux configuration. Which means > this is a partial solution only -- please try to propose a better one or > provide the original problem report so that someone else can have a look > at it. What's triggering the error interrupt for example? Is it > recoverable? ExtINT is routed to ioapic pin0. but the dst is set to 0. and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu is set to 4 instead of 0 YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-10 19:49 ` Yinghai Lu @ 2008-06-11 0:29 ` Maciej W. Rozycki 2008-06-11 2:32 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-11 0:29 UTC (permalink / raw) To: Yinghai Lu; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, 10 Jun 2008, Yinghai Lu wrote: > ExtINT is routed to ioapic pin0. but the dst is set to 0. > and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu > is set to 4 instead of 0 Thanks for the info. Let me understand the situation better: local APIC IDs are preassigned by the firmware based on their "socket address" and the socket where the lowest numbered quad would be is empty. Nevertheless the firmware sets the destination ID of the ExtINTA interrupt in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that correct? But it would mean the Virtual Wire interrupt delivery would not work, or is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is set up for ExtINTA delivery as well? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-11 0:29 ` Maciej W. Rozycki @ 2008-06-11 2:32 ` Yinghai Lu 2008-06-11 12:57 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2008-06-11 2:32 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, Jun 10, 2008 at 5:29 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote: > On Tue, 10 Jun 2008, Yinghai Lu wrote: > >> ExtINT is routed to ioapic pin0. but the dst is set to 0. >> and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu >> is set to 4 instead of 0 > > Thanks for the info. Let me understand the situation better: local APIC > IDs are preassigned by the firmware based on their "socket address" and > the socket where the lowest numbered quad would be is empty. > Nevertheless the firmware sets the destination ID of the ExtINTA interrupt > in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that > correct? Yes after I asked bios engineer to change the dest apic id to 4, the error is gone. before clear_IO_APIC() number of MP IRQ sources: 15. number of IO-APIC #0 registers: 24. number of IO-APIC #1 registers: 7. number of IO-APIC #2 registers: 7. number of IO-APIC #3 registers: 24. testing the IO APIC....................... IO APIC #0...... .... register #00: 00000000 ....... : physical APIC id: 00 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect: 00 004 0 0 0 0 0 0 7 00 01 000 1 0 0 0 0 0 0 00 02 000 1 0 0 0 0 0 0 00 03 000 1 0 0 0 0 0 0 00 04 000 1 0 0 0 0 0 0 00 05 000 1 0 0 0 0 0 0 00 06 000 1 0 0 0 0 0 0 00 07 000 1 0 0 0 0 0 0 00 08 000 1 0 0 0 0 0 0 00 09 000 1 0 0 0 0 0 0 00 0a 000 1 0 0 0 0 0 0 00 0b 000 1 0 0 0 0 0 0 00 0c 000 1 0 0 0 0 0 0 00 0d 000 1 0 0 0 0 0 0 00 0e 000 1 0 0 0 0 0 0 00 0f 000 1 0 0 0 0 0 0 00 10 000 1 0 0 0 0 0 0 00 11 000 1 0 0 0 0 0 0 00 12 000 1 0 0 0 0 0 0 00 13 000 1 0 0 0 0 0 0 00 14 000 1 0 0 0 0 0 0 00 15 000 1 0 0 0 0 0 0 00 16 000 1 0 0 0 0 0 0 00 17 000 1 0 0 0 0 0 0 00 > > But it would mean the Virtual Wire interrupt delivery would not work, or > is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is > set up for ExtINTA delivery as well? it doesn't need to virtual wire for timer (ExtInt), timer is hpet and routed to ioapic pin2. Just know at first BIOS engineer refused to change that to 4, because other os is not happy. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-11 2:32 ` Yinghai Lu @ 2008-06-11 12:57 ` Maciej W. Rozycki 0 siblings, 0 replies; 38+ messages in thread From: Maciej W. Rozycki @ 2008-06-11 12:57 UTC (permalink / raw) To: Yinghai Lu; +Cc: Glauber Costa, linux-kernel, akpm, tglx, mingo, hugh On Tue, 10 Jun 2008, Yinghai Lu wrote: > > Thanks for the info. Let me understand the situation better: local APIC > > IDs are preassigned by the firmware based on their "socket address" and > > the socket where the lowest numbered quad would be is empty. > > Nevertheless the firmware sets the destination ID of the ExtINTA interrupt > > in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that > > correct? > > Yes > > after I asked bios engineer to change the dest apic id to 4, the error > is gone. Thanks for the clarification. > > But it would mean the Virtual Wire interrupt delivery would not work, or > > is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is > > set up for ExtINTA delivery as well? > > it doesn't need to virtual wire for timer (ExtInt), timer is hpet and > routed to ioapic pin2. That's not what I asked about -- the timer does not matter here. The Virtual Wire mode is a configuration, where one input of one APIC in the system is set up for the ExtINTA mode and acts transparently with the system software having no need to know about it. Instead a pair of legacy 8259A chips is used to deliver interrupts, including claiming the INTA cycles, providing vectors and prioritising sources, as defined in the PC/AT architecture. Many pieces of software rely on the 8259A PICs, either because they predate the APIC or because they have no means to make use of multiprocessor features anyway. They include various versions of DOS together with software run in that environment (as DOS programs quite frequently drive hardware at the low level), many versions of the Microsoft Windows system as well as other software. For these a legacy mode, either the Virtual Wire mode, or a mode where 8259A interrupts are delivered directly to one processor's INT line has to be implemented as mandated both by the Multiprocessor Specification and the Advanced Configuration and Power Interface Specification. Coming back to my question -- how is such a mode implemented in the affected system? Clearly the route through the I/O APIC is not going to work and moreover, the chip clutters the system with broken interrupt messages each time the 8259A signals an interrupt. Please note Linux can use the Virtual Wire mode in the APIC/SMP mode too, if requested by specifying the "noapic" command-line option -- have you verified the option works correctly with the affected system? > Just know at first BIOS engineer refused to change that to 4, because > other os is not happy. Well, this is just a confirmation my attitude is correct -- such problems should not be papered over, because vendors will deny their existence then. At least a complaint message should be printed so that users have an opportunity to see it and ask their hardware supplier for an explanation. In this case, a workaround for the 64-bit mode happens to be quite cheap, but it should be extended to cover the 32-bit mode as well. The only solution for the 32-bit mode I have in mind would lead to a waste of resources for many users of good hardware. And this because of somebody's sloppiness -- as I have written -- this better be well justified. Unless you have precise means to identify this system -- in that case I think reconfiguring the bootstrap processor's local APIC ID to 0 would be the right approach. Have you tried it? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus 2008-06-09 20:12 ` Maciej W. Rozycki 2008-06-09 20:53 ` Yinghai Lu @ 2008-06-09 21:02 ` Glauber Costa 1 sibling, 0 replies; 38+ messages in thread From: Glauber Costa @ 2008-06-09 21:02 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, akpm, tglx, mingo, hugh Maciej W. Rozycki wrote: > On Mon, 9 Jun 2008, Glauber Costa wrote: > >> This one does enable_IO_APIC in the init_uniprocessor too, and should >> account for the !smp case. > > Hmm, it looks a little bit better, but why do you want to call > enable_IO_APIC() separately in the first place? There is a comment > stating: "Enable IO APIC before setting up error vector," but why is it > needed on 64-bit systems? Especially as the very same system may run a > 32-bit kernel and then it suddenly would not have to do this anymore? > Strange... This was reported by Yinghai, but I think he already answered to that. > Also since you are cleaning up this code -- why don't you actually take > the opportunity and get rid of the horrible #ifdefs interspersed > throughout? throughout where? They're all over the place ;) My next target would be per-cpu data. But that's because there's _a lot_ of code in the tree that got ifdefs between 32 and 64-bit because of differences in that, specially irq statistics. A macro would do, but if we're gonna do it, let's do it right. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 14:16 ` [PATCH 03/15] x86: remove early_gdt_descr reference Glauber Costa 2008-06-09 14:16 ` [PATCH 04/15] x86: move x86_64 gdt closer to i386 Glauber Costa @ 2008-06-09 15:23 ` James Bottomley 2008-06-09 15:49 ` Glauber Costa 1 sibling, 1 reply; 38+ messages in thread From: James Bottomley @ 2008-06-09 15:23 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, akpm, tglx, mingo, hugh On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote: > since we use switch_to_new_gdt, there is no point > in assigning early_gdt_descr except for the first > assignment, which is done manually. What makes you think you can do this? If you don't update the early boot gdt, they all end up using the Boot CPU one. The problem with this is that there's a time from start_secondary to switch_to_new_gdt where the per cpu selector (%fs) and the pda selector (%gs) are those of the boot CPU. The former isn't a problem but the CPU number is in the latter, and it's used in that path before we get to the initialisation. James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 15:23 ` [PATCH 03/15] x86: remove early_gdt_descr reference James Bottomley @ 2008-06-09 15:49 ` Glauber Costa 2008-06-09 17:20 ` James Bottomley 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 15:49 UTC (permalink / raw) To: James Bottomley; +Cc: linux-kernel, akpm, tglx, mingo, hugh James Bottomley wrote: > On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote: >> since we use switch_to_new_gdt, there is no point >> in assigning early_gdt_descr except for the first >> assignment, which is done manually. > > What makes you think you can do this? If you don't update the early > boot gdt, they all end up using the Boot CPU one. The problem with this > is that there's a time from start_secondary to switch_to_new_gdt where > the per cpu selector (%fs) and the pda selector (%gs) are those of the > boot CPU. The former isn't a problem but the CPU number is in the > latter, and it's used in that path before we get to the initialisation. You are right, I missed it. However, it only seem to be used in cpu_init, and very early. Sure there are some users _before_ we load the new gdt, but nothing prevents them to be moved after it. (Of course, this patch is wrong anyway). And if we do that, we can even take the %fs loading out of head_32.S Of course, it's only valid if those are indeed the only early users of it. Is there any other use I'm missing? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 15:49 ` Glauber Costa @ 2008-06-09 17:20 ` James Bottomley 2008-06-09 17:23 ` Glauber Costa 0 siblings, 1 reply; 38+ messages in thread From: James Bottomley @ 2008-06-09 17:20 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, akpm, tglx, mingo, hugh On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote: > James Bottomley wrote: > > On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote: > >> since we use switch_to_new_gdt, there is no point > >> in assigning early_gdt_descr except for the first > >> assignment, which is done manually. > > > > What makes you think you can do this? If you don't update the early > > boot gdt, they all end up using the Boot CPU one. The problem with this > > is that there's a time from start_secondary to switch_to_new_gdt where > > the per cpu selector (%fs) and the pda selector (%gs) are those of the > > boot CPU. The former isn't a problem but the CPU number is in the > > latter, and it's used in that path before we get to the initialisation. > > You are right, I missed it. > > However, it only seem to be used in cpu_init, and very early. Sure there > are some users _before_ we load the new gdt, but nothing prevents them > to be moved after it. (Of course, this patch is wrong anyway). > > And if we do that, we can even take the %fs loading out of head_32.S > Of course, it's only valid if those are indeed the only early users of it. > > Is there any other use I'm missing? Well, %fs loading there is done for the boot CPU. To eliminate that you have to not only verify that start_secondary doesn't use anything in per_cpu areas, but also verify that nothing in start_kernel() up until boot_cpu_init() does ... That's a lot of smp_processor_id() references to convert. James ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 17:20 ` James Bottomley @ 2008-06-09 17:23 ` Glauber Costa 2008-06-09 17:40 ` James Bottomley 0 siblings, 1 reply; 38+ messages in thread From: Glauber Costa @ 2008-06-09 17:23 UTC (permalink / raw) To: James Bottomley; +Cc: linux-kernel, akpm, tglx, mingo, hugh James Bottomley wrote: > On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote: >> James Bottomley wrote: >>> On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote: >>>> since we use switch_to_new_gdt, there is no point >>>> in assigning early_gdt_descr except for the first >>>> assignment, which is done manually. >>> What makes you think you can do this? If you don't update the early >>> boot gdt, they all end up using the Boot CPU one. The problem with this >>> is that there's a time from start_secondary to switch_to_new_gdt where >>> the per cpu selector (%fs) and the pda selector (%gs) are those of the >>> boot CPU. The former isn't a problem but the CPU number is in the >>> latter, and it's used in that path before we get to the initialisation. >> You are right, I missed it. >> >> However, it only seem to be used in cpu_init, and very early. Sure there >> are some users _before_ we load the new gdt, but nothing prevents them >> to be moved after it. (Of course, this patch is wrong anyway). >> >> And if we do that, we can even take the %fs loading out of head_32.S >> Of course, it's only valid if those are indeed the only early users of it. >> >> Is there any other use I'm missing? > > Well, %fs loading there is done for the boot CPU. To eliminate that you > have to not only verify that start_secondary doesn't use anything in > per_cpu areas, but also verify that nothing in start_kernel() up until > boot_cpu_init() does ... That's a lot of smp_processor_id() references > to convert. Yes, after a second look, it would be tricky indeed. But only for cpu0. For all the others, I still think we could get rid of the problem by switching to the new gdt earlier in cpu_init. What do you think? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 03/15] x86: remove early_gdt_descr reference 2008-06-09 17:23 ` Glauber Costa @ 2008-06-09 17:40 ` James Bottomley 0 siblings, 0 replies; 38+ messages in thread From: James Bottomley @ 2008-06-09 17:40 UTC (permalink / raw) To: Glauber Costa; +Cc: linux-kernel, akpm, tglx, mingo, hugh On Mon, 2008-06-09 at 14:23 -0300, Glauber Costa wrote: > James Bottomley wrote: > > On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote: > >> James Bottomley wrote: > >>> On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote: > >>>> since we use switch_to_new_gdt, there is no point > >>>> in assigning early_gdt_descr except for the first > >>>> assignment, which is done manually. > >>> What makes you think you can do this? If you don't update the early > >>> boot gdt, they all end up using the Boot CPU one. The problem with this > >>> is that there's a time from start_secondary to switch_to_new_gdt where > >>> the per cpu selector (%fs) and the pda selector (%gs) are those of the > >>> boot CPU. The former isn't a problem but the CPU number is in the > >>> latter, and it's used in that path before we get to the initialisation. > >> You are right, I missed it. > >> > >> However, it only seem to be used in cpu_init, and very early. Sure there > >> are some users _before_ we load the new gdt, but nothing prevents them > >> to be moved after it. (Of course, this patch is wrong anyway). > >> > >> And if we do that, we can even take the %fs loading out of head_32.S > >> Of course, it's only valid if those are indeed the only early users of it. > >> > >> Is there any other use I'm missing? > > > > Well, %fs loading there is done for the boot CPU. To eliminate that you > > have to not only verify that start_secondary doesn't use anything in > > per_cpu areas, but also verify that nothing in start_kernel() up until > > boot_cpu_init() does ... That's a lot of smp_processor_id() references > > to convert. > Yes, after a second look, it would be tricky indeed. But only for cpu0. > For all the others, I still think we could get rid of the problem by > switching to the new gdt earlier in cpu_init. > > What do you think? Operating a CPU with a bogus GDT is very fragile. You can fix all the current issues with the secondary CPUs, but it gives a critical section within which none of the per_cpu operations will work. It only takes one patch violating this rule and we have a very subtle bug introduced. It looks to me like the better fix might be just to initialise the gdt completely and properly in do_boot_cpu and just have the single switch in head_32.S be the correct one. That way there's no problem critical region. James ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2008-06-11 13:00 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-09 14:16 [PATCH 0/15] Improve x86 smpboot integration Glauber Costa 2008-06-09 14:16 ` [PATCH 01/15] x86: use stack_start in x86_64 Glauber Costa 2008-06-09 14:16 ` [PATCH 02/15] x86: don't use gdt_page openly Glauber Costa 2008-06-09 14:16 ` [PATCH 03/15] x86: remove early_gdt_descr reference Glauber Costa 2008-06-09 14:16 ` [PATCH 04/15] x86: move x86_64 gdt closer to i386 Glauber Costa 2008-06-09 14:16 ` [PATCH 05/15] x86: use initial_code for i386 Glauber Costa 2008-06-09 14:16 ` [PATCH 06/15] x86: boot secondary cpus through initial_code Glauber Costa 2008-06-09 14:16 ` [PATCH 07/15] x86: clearing io_apic harmless for x86_64 Glauber Costa 2008-06-09 14:16 ` [PATCH 08/15] x86: remove ifdef from stepping Glauber Costa 2008-06-09 14:16 ` [PATCH 09/15] x86: change __setup_vector_irq with setup_vector_irq Glauber Costa 2008-06-09 14:16 ` [PATCH 10/15] x86: provide connect_bsp_APIC for x86_64 Glauber Costa 2008-06-09 14:16 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Glauber Costa 2008-06-09 14:16 ` [PATCH 12/15] x86: change naming to match x86_64 Glauber Costa 2008-06-09 14:16 ` [PATCH 13/15] x86: remove cpu from maps Glauber Costa 2008-06-09 14:16 ` [PATCH 14/15] x86: move cpu_exit_clear to process_32.c Glauber Costa 2008-06-09 14:16 ` [PATCH 15/15] x86: take load_sp0 out of smpboot.c Glauber Costa 2008-06-09 15:23 ` [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus Maciej W. Rozycki 2008-06-09 15:52 ` Glauber Costa 2008-06-09 19:44 ` Glauber Costa 2008-06-09 20:12 ` Maciej W. Rozycki 2008-06-09 20:53 ` Yinghai Lu 2008-06-09 21:00 ` Maciej W. Rozycki 2008-06-10 2:46 ` Maciej W. Rozycki 2008-06-10 5:08 ` Yinghai Lu 2008-06-10 13:00 ` Glauber Costa 2008-06-10 13:30 ` Maciej W. Rozycki 2008-06-10 19:09 ` Yinghai Lu 2008-06-10 19:36 ` Maciej W. Rozycki 2008-06-10 19:49 ` Yinghai Lu 2008-06-11 0:29 ` Maciej W. Rozycki 2008-06-11 2:32 ` Yinghai Lu 2008-06-11 12:57 ` Maciej W. Rozycki 2008-06-09 21:02 ` Glauber Costa 2008-06-09 15:23 ` [PATCH 03/15] x86: remove early_gdt_descr reference James Bottomley 2008-06-09 15:49 ` Glauber Costa 2008-06-09 17:20 ` James Bottomley 2008-06-09 17:23 ` Glauber Costa 2008-06-09 17:40 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox