From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 1 of 4 RFC] x86/kexec: Change NMI and MCE handling on kexec path Date: Wed, 5 Dec 2012 10:05:01 +0000 Message-ID: <50BF1C4D.6050609@citrix.com> References: <50BF1F2F02000078000AE0E3@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <50BF1F2F02000078000AE0E3@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 05/12/12 09:17, Jan Beulich wrote: >>>> On 04.12.12 at 19:16, Andrew Cooper wrote: >> --- a/xen/arch/x86/crash.c >> +++ b/xen/arch/x86/crash.c >> @@ -32,41 +32,97 @@ >> >> static atomic_t waiting_for_crash_ipi; >> static unsigned int crashing_cpu; >> +static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done); >> >> -static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu) >> +/* This becomes the NMI handler for non-crashing CPUs. */ >> +void __attribute__((noreturn)) do_nmi_crash(struct cpu_user_regs *regs) >> { >> - /* Don't do anything if this handler is invoked on crashing cpu. >> - * Otherwise, system will completely hang. Crashing cpu can get >> - * an NMI if system was initially booted with nmi_watchdog parameter. >> + /* nmi_shootdown_cpus() should ensure that this assertion is correct. */ >> + ASSERT( crashing_cpu != smp_processor_id() ); >> + >> + /* Save crash information and shut down CPU. Attempt only once. */ >> + if ( ! this_cpu(crash_save_done) ) >> + { >> + kexec_crash_save_cpu(); >> + __stop_this_cpu(); >> + >> + this_cpu(crash_save_done) = 1; >> + } >> + >> + /* Poor mans self_nmi(). __stop_this_cpu() has reverted the LAPIC >> + * back to its boot state, so we are unable to rely on the regular >> + * apic_* functions, due to 'x2apic_enabled' being possibly wrong. >> + * (The likely senario is that we have reverted from x2apic mode to >> + * xapic, at which point #GPFs will occur if we use the apic_* >> + * functions) >> + * >> + * The ICR and APIC ID of the LAPIC are still valid even during >> + * software disable (Intel SDM Vol 3, 10.4.7.2), which allows us to >> + * queue up another NMI, to force us back into this loop if we exit. >> */ >> - if ( cpu == crashing_cpu ) >> - return 1; >> - local_irq_disable(); >> + switch ( current_local_apic_mode() ) >> + { >> + u32 apic_id; >> >> - kexec_crash_save_cpu(); >> + case APIC_MODE_X2APIC: >> + apic_id = apic_rdmsr(APIC_ID); >> >> - __stop_this_cpu(); >> + apic_wrmsr(APIC_ICR, APIC_DM_NMI | APIC_DEST_PHYSICAL | ((u64)apic_id << 32)); > As in the description you talk about the LAPIC being (possibly) > disabled here - did you check that this would not #GP in that > case? Yes - we are switching on current_local_apic_mode() which reads the APICBASE MSR to work out which mode we are in, and returns an enum apic_mode. With this information, all the apic_**msr accesses are in case x2apic, and all the apic_mem_** accesses are in case xapic. If the apic_mode is disabled, then we skip trying to set up the next NMI. The patch is sadly rather less legible than I would have hoped, but the code post patch is far more logical to read. >> + break; >> >> - atomic_dec(&waiting_for_crash_ipi); >> + case APIC_MODE_XAPIC: >> + apic_id = GET_xAPIC_ID(apic_mem_read(APIC_ID)); >> + >> + while ( apic_mem_read(APIC_ICR) & APIC_ICR_BUSY ) >> + cpu_relax(); >> + >> + apic_mem_write(APIC_ICR2, apic_id << 24); >> + apic_mem_write(APIC_ICR, APIC_DM_NMI | APIC_DEST_PHYSICAL); >> + break; >> + >> + default: >> + break; >> + } >> >> for ( ; ; ) >> halt(); >> - >> - return 1; >> } >> ... >> ENTRY(machine_check) >> pushq $0 >> movl $TRAP_machine_check,4(%rsp) >> jmp handle_ist_exception >> >> +/* No op trap handler. Required for kexec path. */ >> +ENTRY(trap_nop) > I'd prefer you to re-use an existing IRETQ (e.g. the one you add > below) here - this is not performance critical, and hence there's > no need for it to be aligned. > > Jan Certainly. ~Andrew > >> + iretq >> + >> +/* Enable NMIs. No special register assumptions, and all preserved. */ >> +ENTRY(enable_nmis) >> + push %rax >> + movq %rsp, %rax /* Grab RSP before pushing */ >> + >> + /* Set up stack frame */ >> + pushq $0 /* SS */ >> + pushq %rax /* RSP */ >> + pushfq /* RFLAGS */ >> + pushq $__HYPERVISOR_CS /* CS */ >> + leaq 1f(%rip),%rax >> + pushq %rax /* RIP */ >> + >> + iretq /* Disable the hardware NMI latch */ >> +1: >> + popq %rax >> + retq >> + >> .section .rodata, "a", @progbits >> >> ENTRY(exception_table) -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com